The optimal training strategy involves balancing compute costs between pre-training, RL fine-tuning, and inference. A heuristic suggests equalizing these costs, implying that the total inference tokens should roughly match pre-training tokens, potentially leading to models being significantly 'over-trained' compared to Chinchilla scaling laws.
Impact: High. This cost-balancing approach suggests that current frontier models might be trained on orders of magnitude more data than Chinchilla-optimal, impacting development efficiency and resource allocation. It reframes model development from pure training optimization to a holistic compute cost perspective.
In the source video, this keypoint occurs from 01:19:07 to 01:24:11.
Sources in support: Dwarkesh Patel (Host)

