intuitive and scalable reinforcement learning

The more I worked on reinforcement learning, the more I came to appreciate that most of the bottlenecks that were classically attributed to algorithmic issues were actually due to the particular interaction between the machinery of reinforcement learning and neural networks.

I have explored this by contributing to the discovery that, when training agents online, they exhibit a strong primacy bias. The solution that we found to this problem, as simple as periodically resetting part of a network, led to one of the first scalable training recipes for online reinforcement learning.

I have also been exploring the geometry of return landscapes, and how it is affected by the machinery of reinforcement learning.

research note · last updated 2026

NeurIPS 2023

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Nate Rahn*, Pierluca D'Oro*, et al.

Read paper
ICLR 2023 · oral

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

Pierluca D'Oro*, Max Schwarzer*, et al.

Read paper
ICML 2022

The Primacy Bias in Deep Reinforcement Learning

Evgenii Nikishin*, Max Schwarzer*, Pierluca D'Oro*, et al.

Read paper

* equal contribution

intuitive and scalable reinforcement learning

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

The Primacy Bias in Deep Reinforcement Learning