Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Compute vs. Memory: The Roofline Model — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

Analyzing AI inference performance requires understanding the balance between compute throughput (FLOPs) and memory bandwidth. Compute time scales linearly with batch size and active parameters, while memory time involves fetching all model weights and the KV cache. The interplay dictates whether a system is compute-bound or memory-bound.

Impact: High. This technical framework provides a quantitative lens to diagnose performance bottlenecks and optimize hardware utilization for AI models.

In the source video, this keypoint occurs from 00:07:35 to 00:12:35.

Sources in support: Reiner Pope (CEO of MatX, former TPU architect at Google)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.