Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Memory Capacity vs. Bandwidth Bottlenecks — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

While bandwidth is often discussed, memory capacity per rack is a primary constraint for large models. Even with advanced interconnects, if a model's total parameters and KV cache exceed the available memory within a scale-up domain (rack), pipelining becomes necessary to distribute the memory load, even if it doesn't improve latency.

Impact: High. This highlights that memory capacity, not just speed, is a critical bottleneck for scaling AI models, driving architectural decisions like pipelining to manage resource constraints effectively.

In the source video, this keypoint occurs from 01:01:17 to 01:02:11.

Sources in support: Dwarkesh Patel (Host)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.