AI Research Today

Nested Learning: The Illusion of Deep Learning Architectures

01 December 2025 50:05 🎙️ Aaron

Listen to this episode

About this episode

NL.pdf

In this episode, we dive into Nested Learning (NL) — a new framework that rethinks how neural networks learn, store information, and even modify themselves. While modern language models have made remarkable progress, fundamental questions remain: How do they truly memorize? How do they improve over time? And why does in-context learning emerge at scale?

Nested Learning proposes a bold answer. Instead of viewing a model as a single optimization problem, NL treats it as a hierarchy of nested, multi-level learning processes, each with its own evolving context flow. This perspective sheds new light on how deep models compress information, how in-context learning arises naturally, and how we might build systems with richer, higher-order reasoning abilities.

We explore the paper’s three major contributions:

• Deep Optimizers — A reinterpretation of classic optimizers like Adam and SGD-Momentum as associative memory systems that compress gradients. The authors introduce deeper, more expressive optimizers built directly from NL principles.

• Self-Modifying Titans — A new type of sequence model that learns not just from data, but from its own update rules, enabling it to modify itself during training.

• Continuum Memory System — A unified framework that extends the idea of short- vs long-term memory into a continuous space. Combined with self-modifying models, it leads to HOPE, a learning module showing strong results in language modeling, continual learning, and long-context reasoning.

This episode breaks down what NL means for the future of AI, why it’s mathematically transparent and neuroscientifically inspired, and how it might open a new dimension in deep learning research.

Want to find AI jobs?

Join thousands of AI professionals finding their next opportunity