Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Compute vs. Memory Bandwidth Bottleneck — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

The cost of running AI models is determined by the interplay between compute time and memory bandwidth. Initially, compute cost dominates, but as context length increases, memory bandwidth becomes the primary bottleneck, dictating overall expense. This crossover point is crucial for pricing strategies.

Impact: High. Understanding this bottleneck is key to optimizing AI infrastructure and pricing models, as it dictates where efficiency gains can be made.

In the source video, this keypoint occurs from 01:34:13 to 01:37:18.

Sources in support: Dwarkesh Patel (Host)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.