Do Transformer World Models Give Better Policy Gradients?
Michel Ma*, Tianwei Ni, Clement Gehring, Pierluca D'Oro*, Pierre-Luc Bacon
A world model is a learned simulator of the environment, that an agent can use for learning. In my research career, I have always been by the question of what makes a good world model good, and how the particular way it is used by an agent can shape how we should build it.
In my research, I have focused on advancing our understanding of when the training signal given to an agent by a world model (i.e., the policy gradient) is of good quality. Early on, this led to training objectives that incorporate an agent's current learning state into a world model's training objectives; later on, it led to a deeper understanding of gradient propagation in world models, and to the development of a new family of world model architectures that are designed to give better gradients to an agent.
research note · last updated 2026
Michel Ma*, Tianwei Ni, Clement Gehring, Pierluca D'Oro*, Pierre-Luc Bacon
Pierluca D'Oro, Wojciech Jaśkowski
Pierluca D'Oro*, Alberto Maria Metelli*, et al.
* equal contribution