A new foundation for intrinsic motivation explicitly grounded in dynamical systems and optimal control, without external rewards or designer bias.
Intrinsic Motivation (IM) aims to train agents without external rewards, enabling useful behavior to emerge from the agent's interaction with its environment alone. However, the dominant IM approaches rely on information-theoretic quantities with designer-chosen variables, introducing bias and lacking a principled connection to dynamics or optimal control (OC). We introduce Controllable Information Production (CIP), a new foundation for IM explicitly grounded in dynamical systems and OC. CIP measures the rate at which an agent produces information, capturing controllable complexity without external knowledge or bias. CIP unifies IM and OC into a single framework, formalizing physical intelligence as the control of information production. It further reveals connections between the structure of the value function and Kolmogorov–Sinai entropy. CIP consistently outperforms prior IM methods on standard benchmarks in robot learning and solves tasks they fail on, including humanoid self-righting.
CIP agents develop complex, physically meaningful behaviors from scratch, purely through intrinsic motivation, without any reward shaping.
Mean extremity height normalized to [0, 1] where 1.0 is fully upright. Averaged over 10 random seeds across all environments.
| Environment | CIP (ours) | Empowerment | DIAYN | DADS | SMM | ICM |
|---|---|---|---|---|---|---|
| Cart Pole | 0.9996 | 0.9996 | 0.0221 | 0.0684 | 0.6171 | 0.0005 |
| Double Pendulum | 0.9913 | 0.5433 | 0.2339 | 0.0107 | 0.0112 | 0.0016 |
| Triple Pendulum | 0.9931 | 0.2963 | 0.4865 | 0.0608 | 0.2513 | 0.0069 |
| Gibbon | 0.9319 | 0.2835 | 0.4649 | 0.1430 | 0.4458 | 0.5122 |