Skim Logo
Dwarkesh PatelApril 30, 2026
How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope
2:13:40
DP

How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope

Memory Tiering and Cost Optimization — Dwarkesh Patel

From How GPT-5, Claude, and Gemini are actually trained and served – Reiner Pope. Category: Tech. Format: Commentary. This is a single keypoint from the analysis.

Storing KV cache in different memory tiers (HBM, DDR, Flash) involves trade-offs between retrieval cost, storage cost, and hold time. Optimizing involves balancing these factors, with faster tiers like HBM being more expensive per byte but quicker to access, while slower tiers like DDR or Flash are cheaper but have longer retrieval times.

Impact: High. Strategic use of memory tiers is essential for managing the vast memory requirements of LLMs, directly impacting operational costs and inference speed.

In the source video, this keypoint occurs from 01:48:52 to 01:51:56.

Sources in support: Dwarkesh Patel (Host)

For the full credibility analysis, key takeaways, and other keypoints from this video, see the full analysis on skim.

This keypoint analysis was generated by skim (skim.plus), an AI-powered content analysis platform by Credible AI.