Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Scale-Up vs. Scale-Out Bandwidth and Model Size — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

The total parameter count of a model is limited by the scale-up domain size (memory capacity within a rack), while active parameters are limited by compute. The deployment of larger scale-up domains, like Nvidia's Blackwell with 10-20 TB, unlocks the ability to train and serve models with trillions of parameters, including their KV cache.

Impact: High. This directly addresses the scaling limitations of LLMs, explaining why recent models have seen significant parameter growth only after hardware advancements allowed for larger memory capacities per node.

In the source video, this keypoint occurs from 00:44:45 to 00:47:00.

Sources in support: Dwarkesh Patel (Host)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.