Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Optimal Training vs. Inference Compute Balance — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

The optimal training strategy involves balancing compute costs between pre-training, RL fine-tuning, and inference. A heuristic suggests equalizing these costs, implying that the total inference tokens should roughly match pre-training tokens, potentially leading to models being significantly 'over-trained' compared to Chinchilla scaling laws.

Impact: High. This cost-balancing approach suggests that current frontier models might be trained on orders of magnitude more data than Chinchilla-optimal, impacting development efficiency and resource allocation. It reframes model development from pure training optimization to a holistic compute cost perspective.

In the source video, this keypoint occurs from 01:19:07 to 01:24:11.

Sources in support: Dwarkesh Patel (Host)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.