We stand with Ukraine

We stand with Ukraine

Daniel Russo

Orcid: 0000-0001-5926-8624

Affiliations:

Columbia Business School, Vancouver, BC, Canada
Microsoft Research, Cambridge, MA, USA
Northwestern's Kellogg School of Management, Evanston, IL, USA
Stanford University, Department of Management Science and Engineering. USA

According to our database¹, Daniel Russo authored at least 41 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2026

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients.

[DOI]

,

,

CoRR, May, 2026

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success.

[DOI]

CoRR, January, 2026

2025

Impatient Bandits: Optimizing for the Long-Term Without Delay.

[DOI]

,

Thomas Baldwin-McDonald

,

,

,

CoRR, January, 2025

Contextual Thompson Sampling via Generation of Missing Data.

[DOI]

,

Tiffany Tianhui Cai

,

Hongseok Namkoong

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

2024

Global Optimality Guarantees for Policy Gradient Methods.

[DOI]

,

Oper. Res., 2024

Posterior Sampling via Autoregressive Generation.

[DOI]

,

Tiffany Tianhui Cai

,

Hongseok Namkoong

,

CoRR, 2024

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency.

[DOI]

,

CoRR, 2024

Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification.

[DOI]

,

CoRR, 2024

SURE 2024: Workshop on Strategic and Utility-aware REcommendation.

[DOI]

Himan Abdollahpouri

,

Tonia Danylenko

,

Masoud Mansoury

,

,

,

Mihajlo Grbovic

Proceedings of the 18th ACM Conference on Recommender Systems, 2024

2023

Approximation Benefits of Policy Gradient Methods with Aggregated States.

[DOI]

Manag. Sci., November, 2023

Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization.

[DOI]

,

,

CoRR, 2023

Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective.

[DOI]

,

,

CoRR, 2023

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay.

[DOI]

Thomas M. McDonald

,

,

,

,

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

An Information-Theoretic Analysis of Nonstationary Bandit Learning.

[DOI]

,

Proceedings of the International Conference on Machine Learning, 2023

On the Statistical Benefits of Temporal Difference Learning.

[DOI]

,

Proceedings of the International Conference on Machine Learning, 2023

2022

Satisficing in Time-Sensitive Bandit Learning.

[DOI]

,

Benjamin Van Roy

Math. Oper. Res., November, 2022

Adaptivity and Confounding in Multi-Armed Bandit Experiments.

[DOI]

,

CoRR, 2022

Temporally-Consistent Survival Analysis.

[DOI]

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Technical Note - A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents.

[DOI]

Oper. Res., 2021

On the Futility of Dynamics in Robust Mechanism Design.

[DOI]

Santiago R. Balseiro

,

,

Oper. Res., 2021

Learning to Stop with Surprisingly Few Samples.

[DOI]

,

,

Proceedings of the Conference on Learning Theory, 2021

On the Linear Convergence of Policy Gradient Methods for Finite MDPs.

[DOI]

,

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

How Much Does Your Data Exploration Overfit? Controlling Bias via Information Usage.

[DOI]

,

IEEE Trans. Inf. Theory, 2020

A Note on the Linear Convergence of Policy Gradient Methods.

[DOI]

,

CoRR, 2020

Policy Gradient Optimization of Thompson Sampling Policies.

[DOI]

,

Ciamac C. Moallemi

,

Daniel J. Russo

CoRR, 2020

2019

Deep Exploration via Randomized Value Functions.

[DOI]

,

Benjamin Van Roy

,

Daniel J. Russo

,

J. Mach. Learn. Res., 2019

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents.

[DOI]

CoRR, 2019

Worst-Case Regret Bounds for Exploration via Randomized Value Functions.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018

A Tutorial on Thompson Sampling.

[DOI]

,

Benjamin Van Roy

,

Abbas Kazerouni

,

,

Found. Trends Mach. Learn., 2018

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation.

[DOI]

,

,

Proceedings of the Conference On Learning Theory, 2018

2017

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling.

[DOI]

,

,

Benjamin Van Roy

CoRR, 2017

A Tutorial on Thompson Sampling.

[DOI]

,

Benjamin Van Roy

,

Abbas Kazerouni

,

CoRR, 2017

Improving the Expected Improvement Algorithm.

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

2016

An Information-Theoretic Analysis of Thompson Sampling.

[DOI]

,

Benjamin Van Roy

J. Mach. Learn. Res., 2016

Simple Bayesian Algorithms for Best Arm Identification.

[DOI]

Proceedings of the 29th Conference on Learning Theory, 2016

Controlling Bias in Adaptive Data Analysis Using Information Theory.

[DOI]

,

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2014

Learning to Optimize via Posterior Sampling.

[DOI]

,

Benjamin Van Roy

Math. Oper. Res., 2014

Learning to Optimize via Information-Directed Sampling.

[DOI]

,

Benjamin Van Roy

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

2013

Welfare-Improving Cascades and the Effect of Noisy Reviews.

[DOI]

,

Proceedings of the Web and Internet Economics - 9th International Conference, 2013

Eluder Dimension and the Sample Complexity of Optimistic Exploration.

[DOI]

,

Benjamin Van Roy

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

(More) Efficient Reinforcement Learning via Posterior Sampling.

[DOI]

,

,

Benjamin Van Roy

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Loading...