Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Quantifying Over-training: A Hundredfold Increase — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

By comparing estimated inference token counts (hundreds of trillions) with Chinchilla-optimal token counts (trillions), current frontier models appear to be over-trained by a factor of approximately 100. This suggests a significant deviation from theoretical optimal training ratios, driven by the need to balance training and inference costs.

Impact: High. This massive over-training ratio implies a substantial investment in data and compute beyond theoretical minimums, potentially indicating a strategic choice to optimize for overall deployment cost and performance rather than just training efficiency.

In the source video, this keypoint occurs from 01:30:41 to 01:32:12.

Sources in support: Dwarkesh Patel (Host)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.