NeurIPS 2026

Emergence of Physical Intelligence
via
Controllable Information Production

A new foundation for intrinsic motivation explicitly grounded in dynamical systems and optimal control, without external rewards or designer bias.

Anonymous Authors

Under review · NeurIPS 2026

Paper (PDF) Watch Videos Results Code

Abstract

Intrinsic Motivation
Grounded in Optimal Control

Intrinsic Motivation (IM) aims to train agents without external rewards, enabling useful behavior to emerge from the agent's interaction with its environment alone. However, the dominant IM approaches rely on information-theoretic quantities with designer-chosen variables, introducing bias and lacking a principled connection to dynamics or optimal control (OC). We introduce Controllable Information Production (CIP), a new foundation for IM explicitly grounded in dynamical systems and OC. CIP measures the rate at which an agent produces information, capturing controllable complexity without external knowledge or bias. CIP unifies IM and OC into a single framework, formalizing physical intelligence as the control of information production. It further reveals connections between the structure of the value function and Kolmogorov–Sinai entropy. CIP consistently outperforms prior IM methods on standard benchmarks in robot learning and solves tasks they fail on, including humanoid self-righting.

Qualitative Results

Emergent Physical Behaviors

CIP agents develop complex, physically meaningful behaviors from scratch, purely through intrinsic motivation, without any reward shaping.

Gibbon

Triple Pendulum

Double Pendulum

Cart Pole

Quantitative Results

State-of-the-Art Performance

Mean extremity height normalized to [0, 1] where 1.0 is fully upright. Averaged over 10 random seeds across all environments.

Environment	CIP (ours)	Empowerment	DIAYN	DADS	SMM	ICM
Cart Pole	0.9996	0.9996	0.0221	0.0684	0.6171	0.0005
Double Pendulum	0.9913	0.5433	0.2339	0.0107	0.0112	0.0016
Triple Pendulum	0.9931	0.2963	0.4865	0.0608	0.2513	0.0069
Gibbon	0.9319	0.2835	0.4649	0.1430	0.4458	0.5122

Mean extremity height over the final 10% of each episode, averaged across 10 random seeds. Values are min–max normalized so that 0.0 is the initial hanging configuration and 1.0 is the fully upright pose. Bold indicates best performance.

Mean height over time across all environments

Fig. 1. Mean height ± std of the extremity over time in each environment. Heights are normalized between 0.0 (hanging) and 1.0 (fully upright). Averaged over 10 random seeds. CIP (green) consistently reaches and maintains upright posture while baselines plateau or fail.

Fig. 2. Kolmogorov-Sinai entropy (KSE) estimation on the Lorenz attractor in two typical parameter regimes. Our estimator (green) converges faster than the standard QR-based approach (blue) in both chaotic regimes.

Fig. 3. Estimated values of CIP (nats/s) over time for each environment during MPC runs. Higher CIP values correspond to agents exploring dynamically richer regions of state space.

Contributions

Main Contributions

New IM Paradigm

We introduce Controllable Information Production (CIP), a principled intrinsic motivation objective that drives the emergence of useful physical behaviors without external rewards or designer-specified variables.

Grounding in Optimal Control

We connect CIP to value function structure, grounding IM in optimal control theory and enabling an efficient, scalable estimator with a numerically stable optimization method. We reveal ties to Kolmogorov–Sinai entropy.

State-of-the-Art on URLB

CIP sets the new state of the art in intrinsic motivation on URLB, outperforming all prior methods and uniquely enabling complex physical skills such as humanoid self-righting, which prior methods cannot solve.

Emergence of Physical IntelligenceviaControllable Information Production

Intrinsic MotivationGrounded in Optimal Control

Emergent Physical Behaviors

State-of-the-Art Performance

Main Contributions

Emergence of Physical Intelligence
via
Controllable Information Production

Intrinsic Motivation
Grounded in Optimal Control