Paavo Parmas

According to our database¹, Paavo Parmas authored at least 11 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Finite-Time Regret Analysis of Retry-Aware Bandits.

[BibT_eX]

[DOI]

CoRR, May, 2026

Does "Do Differentiable Simulators Give Better Policy Gradients?" Give Better Policy Gradients?

[BibT_eX]

[DOI]

CoRR, April, 2026

2025

Double Horizon Model-Based Policy Optimization.

[BibT_eX]

[DOI]

Akihiro Kubo

Paavo Parmas

Shin Ishii

Trans. Mach. Learn. Res., 2025

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2023

Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators.

[BibT_eX]

[DOI]

Paavo Parmas

Takuma Seno

Yuma Aoki

Proceedings of the International Conference on Machine Learning, 2023

2022

Proppo: a Message Passing Framework for Customizable and Composable Learning Algorithms.

[BibT_eX]

[DOI]

Paavo Parmas

Takuma Seno

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

A unified view of likelihood ratio and reparameterization gradients.

[BibT_eX]

[DOI]

Paavo Parmas

Masashi Sugiyama

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients.

[BibT_eX]

[DOI]

Jean-Baptiste Lespiau

Paavo Parmas

Edgar A. Duéñez-Guzmán

Karl Tuyls

Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

2019

A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme.

[BibT_eX]

[DOI]

Paavo Parmas

Masashi Sugiyama

CoRR, 2019

2018

Total stochastic gradient algorithms and applications in reinforcement learning.

[BibT_eX]

[DOI]

Paavo Parmas

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos.

[BibT_eX]

[DOI]

Paavo Parmas

Carl Edward Rasmussen

Jan Peters

Kenji Doya

Proceedings of the 35th International Conference on Machine Learning, 2018

Paavo Parmas

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...