The RevNets paper from 2017 demonstrates how the Feistel construction can be applied to any neural network, including transformers, to make the entire network invertible. This allows for the rematerialization of activations during the backward pass, drastically cutting down the memory needed for training.
Impact: High. This architectural innovation directly addresses a major bottleneck in training large neural networks, offering a path to greater efficiency and scalability by optimizing memory usage.
In the source video, this keypoint occurs from 02:10:11 to 02:12:00.
Sources in support: Dwarkesh Patel (Host)

