By comparing estimated inference token counts (hundreds of trillions) with Chinchilla-optimal token counts (trillions), current frontier models appear to be over-trained by a factor of approximately 100. This suggests a significant deviation from theoretical optimal training ratios, driven by the need to balance training and inference costs.
Impact: High. This massive over-training ratio implies a substantial investment in data and compute beyond theoretical minimums, potentially indicating a strategic choice to optimize for overall deployment cost and performance rather than just training efficiency.
In the source video, this keypoint occurs from 01:30:41 to 01:32:12.
Sources in support: Dwarkesh Patel (Host)

