Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Nate Rahn*, Pierluca D'Oro*, et al.
The more I worked on reinforcement learning, the more I came to appreciate that most of the bottlenecks that were classically attributed to algorithmic issues were actually due to the particular interaction between the machinery of reinforcement learning and neural networks.
I have explored this by contributing to the discovery that, when training agents online, they exhibit a strong primacy bias. The solution that we found to this problem, as simple as periodically resetting part of a network, led to one of the first scalable training recipes for online reinforcement learning.
I have also been exploring the geometry of return landscapes, and how it is affected by the machinery of reinforcement learning.
research note · last updated 2026
Nate Rahn*, Pierluca D'Oro*, et al.
Pierluca D'Oro*, Max Schwarzer*, et al.
Evgenii Nikishin*, Max Schwarzer*, Pierluca D'Oro*, et al.
* equal contribution