Nvidia's Blackwell racks, with 72 GPUs, are designed for expert parallelism. The all-to-all communication pattern within a rack, facilitated by NVLink and internal switches, is ideal for MoE layers. However, scaling beyond a single rack introduces significant communication bottlenecks due to slower rack-to-rack interconnects.
Impact: High. The physical design of GPU racks and their interconnects directly dictates the feasibility and efficiency of scaling AI models. Rack-level all-to-all communication is a key enabler for MoE, but inter-rack communication remains a challenge.
In the source video, this keypoint occurs from 00:37:00 to 00:41:00.
Sources in support: Dwarkesh Patel (Host)

