Continual Learning Is About Understanding Incentives, Not Just Skills
In a November 2025 podcast with Dwarkesh, Ilya Sutskever mentioned that continual learning is one of the challenges with current LLMs (link). While many interpret this as simply feeding models more data or teaching them new technical skills, I believe the challenge is far more nuanced. Continual learning is not just about acquiring new skills while working alongside humans at a company; it is about understanding the incentives, motivations, and “alignment” of the people the AI is working with.
Intent is hidden, not explicit #
In most organizations, the true intent behind a task is very loosely represented in the text. It is usually hidden deep in the previous context or cultural history of a team. You cannot simply dump knowledge as text files into a context window and expect the model to understand the “vibe” or the unwritten rules.
This implicit knowledge lives in the “neurons” of the people inside the company. Extracting that and transferring it to a model is not only about putting that text into the context of the model or its knowledge base. It requires the model to learn the unspoken dynamics that drive decision-making.
Learning human incentive models #
To truly understand these implicit dynamics, models need to build what we might call “human incentive models”: representations of how people react, what they value, their hierarchy of needs, and the underlying motivations that drive their decisions. This is fundamentally about modeling the incentive structures that govern human behavior within an organization.
Humans acquire these social and structural understandings remarkably quickly, partially because of evolution and innate social skills. For AI, we might need to “hack” this process through simulation. The model needs to run through scenarios to understand the specific incentive landscape of a user or an organization without requiring millions of real-world interactions.
Models already know more than we think #
Current foundation models likely know far more than we give them credit for. The “new skill” you think you are teaching it is often just a specific combination of existing skills that it hasn’t been incentivized to use yet. In this sense, teaching new technical skills is less of a challenge—it’s mostly about recombining what the model already understands.
The real challenge: learning human incentives #
The far more challenging and unsolved problem is teaching models to understand human incentives. This is where continual learning becomes genuinely difficult. Unlike technical skills, understanding why humans make certain decisions, what motivates them, and how to navigate organizational dynamics is murky territory.
Moreover, this creates a potential risk: as models learn to understand human incentive structures, they might find ways to hack, cheat, or game these systems. They could learn to manipulate rather than genuinely align. The model might discover shortcuts that satisfy surface-level metrics while subverting the deeper intent—a problem we don’t yet have clear solutions for.
True continual learning will happen when we solve this incentive alignment problem: teaching models to genuinely understand why we do things, not just how, without opening the door to manipulation or misalignment.