NeurIPS 2026

Emergence of Physical Intelligence
via
Controllable Information Production

A new foundation for intrinsic motivation explicitly grounded in dynamical systems and optimal control, without external rewards or designer bias.

Anonymous Authors
Under review  ·  NeurIPS 2026

Intrinsic Motivation
Grounded in Optimal Control

Intrinsic Motivation (IM) aims to train agents without external rewards, enabling useful behavior to emerge from the agent's interaction with its environment alone. However, the dominant IM approaches rely on information-theoretic quantities with designer-chosen variables, introducing bias and lacking a principled connection to dynamics or optimal control (OC). We introduce Controllable Information Production (CIP), a new foundation for IM explicitly grounded in dynamical systems and OC. CIP measures the rate at which an agent produces information, capturing controllable complexity without external knowledge or bias. CIP unifies IM and OC into a single framework, formalizing physical intelligence as the control of information production. It further reveals connections between the structure of the value function and Kolmogorov–Sinai entropy. CIP consistently outperforms prior IM methods on standard benchmarks in robot learning and solves tasks they fail on, including humanoid self-righting.

Emergent Physical Behaviors

CIP agents develop complex, physically meaningful behaviors from scratch, purely through intrinsic motivation, without any reward shaping.

Gibbon
Triple Pendulum
Double Pendulum
Cart Pole

State-of-the-Art Performance

Mean extremity height normalized to [0, 1] where 1.0 is fully upright. Averaged over 10 random seeds across all environments.

Environment CIP (ours) Empowerment DIAYN DADS SMM ICM
Cart Pole 0.9996 0.9996 0.0221 0.0684 0.6171 0.0005
Double Pendulum 0.9913 0.5433 0.2339 0.0107 0.0112 0.0016
Triple Pendulum 0.9931 0.2963 0.4865 0.0608 0.2513 0.0069
Gibbon 0.9319 0.2835 0.4649 0.1430 0.4458 0.5122
Mean extremity height over the final 10% of each episode, averaged across 10 random seeds. Values are min–max normalized so that 0.0 is the initial hanging configuration and 1.0 is the fully upright pose. Bold indicates best performance.
Mean height over time across all environments
Fig. 1. Mean height ± std of the extremity over time in each environment. Heights are normalized between 0.0 (hanging) and 1.0 (fully upright). Averaged over 10 random seeds. CIP (green) consistently reaches and maintains upright posture while baselines plateau or fail.
KSE estimation on the Lorenz system
Fig. 2. Kolmogorov-Sinai entropy (KSE) estimation on the Lorenz attractor in two typical parameter regimes. Our estimator (green) converges faster than the standard QR-based approach (blue) in both chaotic regimes.
CIP rate (nats/s) over time
Fig. 3. Estimated values of CIP (nats/s) over time for each environment during MPC runs. Higher CIP values correspond to agents exploring dynamically richer regions of state space.

Main Contributions

01
New IM Paradigm
We introduce Controllable Information Production (CIP), a principled intrinsic motivation objective that drives the emergence of useful physical behaviors without external rewards or designer-specified variables.
02
Grounding in Optimal Control
We connect CIP to value function structure, grounding IM in optimal control theory and enabling an efficient, scalable estimator with a numerically stable optimization method. We reveal ties to Kolmogorov–Sinai entropy.
03
State-of-the-Art on URLB
CIP sets the new state of the art in intrinsic motivation on URLB, outperforming all prior methods and uniquely enabling complex physical skills such as humanoid self-righting, which prior methods cannot solve.