Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Latency Costs and Scale-Up Domains — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

Inter-rack communication introduces latency costs (a few milliseconds per hop) that stack up sequentially during decode. While pipelining helps with model capacity, larger scale-up domains are crucial for improving memory bandwidth, which in turn supports longer context lengths and lower inference latency.

Impact: High. The interplay between communication latency and memory bandwidth dictates the feasibility of large-scale AI deployments. Optimizing scale-up size is essential for overcoming bandwidth limitations and enabling more capable, responsive models.

In the source video, this keypoint occurs from 01:15:03 to 01:18:37.

Sources in support: Dwarkesh Patel (Host)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.