Benjamin Van Roy

Orcid: 0000-0001-8364-3746

Affiliations:
  • Stanford University, USA


According to our database1, Benjamin Van Roy authored at least 147 papers between 1995 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Efficient Exploration for LLMs.
CoRR, 2024

An Information-Theoretic Analysis of In-Context Learning.
CoRR, 2024

Adaptive Crowdsourcing Via Self-Supervised Learning.
CoRR, 2024

2023
Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping.
Trans. Mach. Learn. Res., 2023

Reinforcement Learning, Bit by Bit.
Found. Trends Mach. Learn., 2023

RLHF and IIA: Perverse Incentives.
CoRR, 2023

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling.
CoRR, 2023

Maintaining Plasticity via Regenerative Regularization.
CoRR, 2023

A Definition of Continual Reinforcement Learning.
CoRR, 2023

On the Convergence of Bounded Agents.
CoRR, 2023

Continual Learning as Computationally Constrained Reinforcement Learning.
CoRR, 2023

Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models.
CoRR, 2023

Bayesian Reinforcement Learning with Limited Cognitive Load.
CoRR, 2023

A Definition of Non-Stationary Bandits.
CoRR, 2023

Approximate Thompson Sampling via Epistemic Neural Networks.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Deep Exploration for Recommendation Systems.
Proceedings of the 17th ACM Conference on Recommender Systems, 2023

Epistemic Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Definition of Continual Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Leveraging Demonstrations to Improve Online Learning: Quality Matters.
Proceedings of the International Conference on Machine Learning, 2023

Scalable Neural Contextual Bandit for Recommender Systems.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

Nonstationary Bandit Learning via Predictive Sampling.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022
Satisficing in Time-Sensitive Bandit Learning.
Math. Oper. Res., November, 2022

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent States.
J. Mach. Learn. Res., 2022

Inclusive Artificial Intelligence.
CoRR, 2022

An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws.
CoRR, 2022

Posterior Sampling for Continuing Environments.
CoRR, 2022

Fine-Tuning Language Models via Epistemic Neural Networks.
CoRR, 2022

On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning.
CoRR, 2022

Is Stochastic Gradient Descent Near Optimal?
CoRR, 2022

Robustness of Epinets against Distributional Shifts.
CoRR, 2022

Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning.
CoRR, 2022

Sample Complexity versus Depth: An Information Theoretic Analysis.
CoRR, 2022

Gaussian Imagination in Bandit Learning.
CoRR, 2022

Evaluating high-order predictive distributions in deep learning.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

An Analysis of Ensemble Sampling.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Neural Testbed: Evaluating Joint Predictions.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

An Information-Theoretic Framework for Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
Evaluating Predictive Distributions: Does Bayesian Deep Learning Work?
CoRR, 2021

Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions.
CoRR, 2021

Epistemic Neural Networks.
CoRR, 2021

A Bit Better? Quantifying Information for Bandit Learning.
CoRR, 2021

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State.
CoRR, 2021

The Value of Information When Deciding What to Learn.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Deciding What to Learn: A Rate-Distortion Approach.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
Randomized Value Functions via Posterior State-Abstraction Sampling.
CoRR, 2020

Langevin DQN.
CoRR, 2020

On Efficiency in Hierarchical Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Behaviour Suite for Reinforcement Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Hypermodels for Exploration.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Deep Exploration via Randomized Value Functions.
J. Mach. Learn. Res., 2019

Provably Efficient Reinforcement Learning with Aggregated States.
CoRR, 2019

Comments on the Du-Kakade-Wang-Yang Lower Bounds.
CoRR, 2019

Information-Theoretic Confidence Bounds for Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On the Performance of Thompson Sampling on Logistic Bandits.
Proceedings of the Conference on Learning Theory, 2019

2018
Learning to Optimize via Information-Directed Sampling.
Oper. Res., 2018

A Tutorial on Thompson Sampling.
Found. Trends Mach. Learn., 2018

An Information-Theoretic Analysis of Thompson Sampling for Large Action Spaces.
CoRR, 2018

An Information-Theoretic Analysis for Thompson Sampling with Many Actions.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Scalable Coordinated Exploration in Concurrent Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Coordinated Exploration in Concurrent Reinforcement Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

2017
Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization.
Math. Oper. Res., 2017

Learning to Price with Reference Effects.
CoRR, 2017

On Optimistic versus Randomized Exploration in Reinforcement Learning.
CoRR, 2017

Gaussian-Dirichlet Posterior Dominance in Sequential Learning.
CoRR, 2017

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling.
CoRR, 2017

A Tutorial on Thompson Sampling.
CoRR, 2017

Ensemble Sampling.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Conservative Contextual Linear Bandits.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
Proceedings of the 34th International Conference on Machine Learning, 2017

2016
An Information-Theoretic Analysis of Thompson Sampling.
J. Mach. Learn. Res., 2016

On Lower Bounds for Regret in Reinforcement Learning.
CoRR, 2016

Posterior Sampling for Reinforcement Learning Without Episodes.
CoRR, 2016

Conservative Contextual Linear Bandits.
CoRR, 2016

Deep Exploration via Bootstrapped DQN.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Generalization and Exploration via Randomized Value Functions.
Proceedings of the 33nd International Conference on Machine Learning, 2016

2015
Adaptive Execution: Exploration and Learning of Price Impact.
Oper. Res., 2015

Bootstrapped Thompson Sampling and Deep Exploration.
CoRR, 2015

2014
Learning to Optimize via Posterior Sampling.
Math. Oper. Res., 2014

Directed Principal Component Analysis.
Oper. Res., 2014

Generalization and Exploration via Randomized Value Functions.
CoRR, 2014

Near-optimal Regret Bounds for Reinforcement Learning in Factored MDPs.
CoRR, 2014

Model-based Reinforcement Learning and the Eluder Dimension.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Near-optimal Reinforcement Learning in Factored MDPs.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

2013
Learning a factor model via regularized PCA.
Mach. Learn., 2013

A Tractable POMDP for a Class of Sequencing Problems
CoRR, 2013

Efficient Exploration and Value Function Generalization in Deterministic Systems.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Eluder Dimension and the Sample Complexity of Optimistic Exploration.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

(More) Efficient Reinforcement Learning via Posterior Sampling.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

2012
Intermediated Blind Portfolio Auctions.
Manag. Sci., 2012

Directed Time Series Regression for Control
CoRR, 2012

A Hybrid Method for Distance Metric Learning.
CoRR, 2012

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

2011
Industry dynamics: Foundations for models with an infinite number of firms.
J. Econ. Theory, 2011

Resource Allocation via Message Passing.
INFORMS J. Comput., 2011

2010
Convergence of min-sum message-passing for convex optimization.
IEEE Trans. Inf. Theory, 2010

Universal reinforcement learning.
IEEE Trans. Inf. Theory, 2010

Manipulation Robustness of Collaborative Filtering.
Manag. Sci., 2010

Computational Methods for Oblivious Equilibrium.
Oper. Res., 2010

Investment and Market Structure in Industries with Congestion.
Oper. Res., 2010

Dynamic Pricing with a Prior on Market Response.
Oper. Res., 2010

On Regression-Based Stopping Times.
Discret. Event Dyn. Syst., 2010

2009
Convergence of min-sum message passing for quadratic optimization.
IEEE Trans. Inf. Theory, 2009

Manipulation Robustness of Collaborative Filtering Systems
CoRR, 2009

Manipulation-resistant collaborative filtering systems.
Proceedings of the 2009 ACM Conference on Recommender Systems, 2009

Directed Regression.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

2008
Capacity of the Trapdoor Channel With Feedback.
IEEE Trans. Inf. Theory, 2008

Reputation markets.
Proceedings of the ACM SIGCOMM 2008 Workshop on Economics of Networked Systems, 2008

2007
A short proof of optimality for the MIN cache replacement algorithm.
Inf. Process. Lett., 2007

Capacity and Zero-Error Capacity of the Chemical Channel with Feedback.
Proceedings of the IEEE International Symposium on Information Theory, 2007

2006
Consensus Propagation.
IEEE Trans. Inf. Theory, 2006

Approximation algorithms for dynamic resource allocation.
Oper. Res. Lett., 2006

Performance Loss Bounds for Approximate Value Iteration with State Aggregation.
Math. Oper. Res., 2006

A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees.
Math. Oper. Res., 2006

A Nonparametric Approach to Multiproduct Pricing.
Oper. Res., 2006

A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning.
Discret. Event Dyn. Syst., 2006

Convergence of the Min-Sum Message Passing Algorithm for Quadratic Optimization
CoRR, 2006

2005
Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

TD(0) Leads to Better Policies than Approximate Value Iteration.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

A universal scheme for learning.
Proceedings of the 2005 IEEE International Symposium on Information Theory, 2005

2004
On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming.
Math. Oper. Res., 2004

Making Eigenvector-Based Reputation Systems Robust to Collusion.
Proceedings of the Algorithms and Models for the Web-Graph: Third International Workshop, 2004

Solitaire: Man Versus Machine.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

2003
The Linear Programming Approach to Approximate Dynamic Programming.
Oper. Res., 2003

Decentralized decision-making in a large team with local information.
Games Econ. Behav., 2003

Self-learning control of finite Markov chains: A.S. Poznyak, K. Najim, E. Gómez-Ramírez, Marcel Dekker, New York, 2000, $150, pp 298, ISBN 0-8247-9249-X.
Autom., 2003

Distributed Optimization in Adaptive Networks.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

On constraint sampling in the linear programming approach to approximate linear programming.
Proceedings of the 42nd IEEE Conference on Decision and Control, 2003

2002
On Average Versus Discounted Reward Temporal-Difference Learning.
Mach. Learn., 2002

Approximate Linear Programming for Average-Cost Dynamic Programming.
Proceedings of the Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, 2002

2001
Regression methods for pricing complex American-style options.
IEEE Trans. Neural Networks, 2001

An analysis of belief propagation on the turbo decoding graph with Gaussian densities.
IEEE Trans. Inf. Theory, 2001

A Tractable POMDP for Dynamic Sequencing with Applications to Personalized Internet Content Provision.
Proceedings of the UAI '01: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, 2001

Approximate Dynamic Programming via Linear Programming.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

2000
Fixed Points of Approximate Value Iteration and Temporal-Difference Learning.
Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29, 2000

The optimal harvesting of environmental bads.
Proceedings of the 39th IEEE Conference on Decision and Control, 2000

Approximate value iteration with randomized policies.
Proceedings of the 39th IEEE Conference on Decision and Control, 2000

1999
Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives.
IEEE Trans. Autom. Control., 1999

Average cost temporal-difference learning.
Autom., 1999

An Analysis of Turbo Decoding with Gaussian Densities.
Proceedings of the Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29, 1999

1998
Learning and value function approximation in complex decision processes.
PhD thesis, 1998

1997
An analysis of temporal-difference learning with function approximation.
IEEE Trans. Autom. Control., 1997

1996
Feature-Based Methods for Large Scale Dynamic Programming.
Mach. Learn., 1996

Approximate Solutions to Optimal Stopping Problems.
Proceedings of the Advances in Neural Information Processing Systems 9, 1996

Analysis of Temporal-Diffference Learning with Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 9, 1996

1995
Stable LInear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions.
Proceedings of the Advances in Neural Information Processing Systems 8, 1995


  Loading...