Nested Learning: The Illusion of Deep Learning Architectures
Listen to this episode
About this episode
NL.pdf
In this episode, we dive into Nested Learning (NL) — a new framework that rethinks how neural networks learn, store information, and even modify themselves. While modern language models have made remarkable progress, fundamental questions remain: How do they truly memorize? How do they improve over time? And why does in-context learning emerge at scale?
Nested Learning proposes a bold answer. Instead of viewing a model as a single optimization problem, NL treats it as a hierarchy of nested, multi-level learning processes, each with its own evolving context flow. This perspective sheds new light on how deep models compress information, how in-context learning arises naturally, and how we might build systems with richer, higher-order reasoning abilities.
We explore the paper’s three major contributions:
• Deep Optimizers — A reinterpretation of classic optimizers like Adam and SGD-Momentum as associative memory systems that compress gradients. The authors introduce deeper, more expressive optimizers built directly from NL principles.
• Self-Modifying Titans — A new type of sequence model that learns not just from data, but from its own update rules, enabling it to modify itself during training.
• Continuum Memory System — A unified framework that extends the idea of short- vs long-term memory into a continuous space. Combined with self-modifying models, it leads to HOPE, a learning module showing strong results in language modeling, continual learning, and long-context reasoning.
This episode breaks down what NL means for the future of AI, why it’s mathematically transparent and neuroscientifically inspired, and how it might open a new dimension in deep learning research.
Want to find AI jobs?
Join thousands of AI professionals finding their next opportunity