Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

The Batch Size Imperative — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

The core principle driving efficiency in AI inference is batching, where multiple user requests are processed simultaneously. Without batching, the cost per token can be a thousand times worse due to unamortized compute and memory fetches. This optimization is critical for making AI services economically viable.

Impact: High. Batching is the linchpin of cost-effective AI inference, directly impacting API pricing and the scalability of AI services.

In the source video, this keypoint occurs from 00:01:40 to 00:06:50.

Sources in support: Reiner Pope (CEO of MatX, former TPU architect at Google)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.