← science

intuitive and scalable reinforcement learning

The more I worked on reinforcement learning, the more I came to appreciate that most of the bottlenecks that were classically attributed to algorithmic issues were actually due to the particular interaction between the machinery of reinforcement learning and neural networks.

I have explored this by contributing to the discovery that, when training agents online, they exhibit a strong primacy bias. The solution that we found to this problem, as simple as periodically resetting part of a network, led to one of the first scalable training recipes for online reinforcement learning.

I have also been exploring the geometry of return landscapes, and how it is affected by the machinery of reinforcement learning.

research note · last updated 2026

* equal contribution