building world models

A world model is a learned simulator of the environment, that an agent can use for learning. In my research career, I have always been by the question of what makes a good world model good, and how the particular way it is used by an agent can shape how we should build it.

In my research, I have focused on advancing our understanding of when the training signal given to an agent by a world model (i.e., the policy gradient) is of good quality. Early on, this led to training objectives that incorporate an agent's current learning state into a world model's training objectives; later on, it led to a deeper understanding of gradient propagation in world models, and to the development of a new family of world model architectures that are designed to give better gradients to an agent.

research note · last updated 2026

ICML 2024

Do Transformer World Models Give Better Policy Gradients?

Michel Ma*, Tianwei Ni, Clement Gehring, Pierluca D'Oro*, Pierre-Luc Bacon

Read paper
NeurIPS 2020

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Pierluca D'Oro, Wojciech Jaśkowski

Read paper
AAAI 2020

Gradient-Aware Model-based Policy Search

Pierluca D'Oro*, Alberto Maria Metelli*, et al.

Read paper

* equal contribution

building world models

Do Transformer World Models Give Better Policy Gradients?

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Gradient-Aware Model-based Policy Search