Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

The Cost of Inference: Amortizing Overheads — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

The cost per token is minimized when compute and memory fetches are effectively amortized over a large batch. Initially, cost is dominated by weight fetches, leading to high expenses at small batch sizes. As batch size increases, compute time becomes the limiting factor, establishing a lower bound on cost per token.

Impact: High. This analysis clarifies why 'slow modes' are economically unviable and establishes the minimum cost achievable for inference, directly influencing pricing strategies.

In the source video, this keypoint occurs from 00:13:12 to 00:17:20.

Sources in support: Reiner Pope (CEO of MatX, former TPU architect at Google)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.