The total parameter count of a model is limited by the scale-up domain size (memory capacity within a rack), while active parameters are limited by compute. The deployment of larger scale-up domains, like Nvidia's Blackwell with 10-20 TB, unlocks the ability to train and serve models with trillions of parameters, including their KV cache.
Impact: High. This directly addresses the scaling limitations of LLMs, explaining why recent models have seen significant parameter growth only after hardware advancements allowed for larger memory capacities per node.
In the source video, this keypoint occurs from 00:44:45 to 00:47:00.
Sources in support: Dwarkesh Patel (Host)

