Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Inference Strategies: Expert Parallelism Dominates — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

For inference, the strategy leans heavily towards expert parallelism, increasing it up to the scale-up domain size while minimizing pipelining. Tensor parallelism, once relevant for cutting up experts, is now less profitable due to smaller expert sizes. This approach is favored unless the model size exceeds a single rack's memory.

Impact: High. This strategic choice optimizes inference performance by prioritizing parallelism that directly addresses model size and latency, rather than solely focusing on memory capacity. It reflects a pragmatic approach to deploying large models efficiently.

In the source video, this keypoint occurs from 01:13:00 to 01:14:43.

Sources in support: Dwarkesh Patel (Host), Jane Street (Trading firm)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.