Lihong Li

Affiliations:
  • Amazon, Seattle, WA, USA
  • Google, Kirkland, WA, USA (former)
  • Microsoft Research, Redmond, WA, USA (former)
  • Yahoo! Research, Santa Clara, CA, USA (former)
  • Rutgers University, Piscataway, NJ, USA (former)
  • University of Alberta, Edmonton, AB, Canada (former)


According to our database1, Lihong Li authored at least 132 papers between 2003 and 2022.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2022
A Reinforcement Learning Approach to Estimating Long-term Treatment Effects.
CoRR, 2022

Estimating Long-term Effects from Experimental Data.
Proceedings of the RecSys '22: Sixteenth ACM Conference on Recommender Systems, Seattle, WA, USA, September 18, 2022

Understanding Domain Randomization for Sim-to-real Transfer.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Guest editorial: special issue on reinforcement learning for real life.
Mach. Learn., 2021

A Map of Bandits for E-commerce.
CoRR, 2021

On the Optimality of Batch Policy Optimization Algorithms.
Proceedings of the 38th International Conference on Machine Learning, 2021

Near-Optimal Representation Learning for Linear Bandits and Linear RL.
Proceedings of the 38th International Conference on Machine Learning, 2021

Neural Thompson Sampling.
Proceedings of the 9th International Conference on Learning Representations, 2021

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL.
Proceedings of the 9th International Conference on Learning Representations, 2021

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing.
CoRR, 2020

Off-Policy Evaluation via the Regularized Lagrangian.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Escaping the Gravitational Pull of Softmax.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

CoinDICE: Off-Policy Confidence Interval Estimation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Neural Contextual Bandits with UCB-based Exploration.
Proceedings of the 37th International Conference on Machine Learning, 2020

Batch Stationary Distribution Estimation.
Proceedings of the 37th International Conference on Machine Learning, 2020

GenDICE: Generalized Offline Estimation of Stationary Values.
Proceedings of the 8th International Conference on Learning Representations, 2020

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation.
Proceedings of the 8th International Conference on Learning Representations, 2020

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Randomized Exploration in Generalized Linear Bandits.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
Neural Approaches to Conversational AI.
Found. Trends Inf. Retr., 2019

A perspective on off-policy evaluation in reinforcement learning.
Frontiers Comput. Sci., 2019

AlgaeDICE: Policy Gradient from Arbitrary Experience.
CoRR, 2019

Neural Contextual Bandits with Upper Confidence Bound-Based Exploration.
CoRR, 2019

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Kernel Loss for Solving the Bellman Equation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Policy Certificates: Towards Accountable Reinforcement Learning.
Proceedings of the 36th International Conference on Machine Learning, 2019

Neural Logic Machines.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Scalable Bilinear π Learning Using State and Action Features.
CoRR, 2018

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Adversarial Attacks on Stochastic Bandits.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation.
Proceedings of the 35th International Conference on Machine Learning, 2018

Scalable Bilinear Learning Using State and Action Features.
Proceedings of the 35th International Conference on Machine Learning, 2018

Boosting the Actor with Dual Critic.
Proceedings of the 6th International Conference on Learning Representations, 2018

Data Poisoning Attacks in Contextual Bandits.
Proceedings of the Decision and Game Theory for Security - 9th International Conference, 2018

Subgoal Discovery for Hierarchical Dialogue Policy Learning.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Smoothed Dual Embedding Control.
CoRR, 2017

Composite Task-Completion Dialogue System via Hierarchical Deep Reinforcement Learning.
CoRR, 2017

Provable Optimal Algorithms for Generalized Linear Contextual Bandits.
CoRR, 2017

Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems.
CoRR, 2017

End-to-End Task-Completion Neural Dialogue Systems.
CoRR, 2017

Scaffolding Networks for Teaching and Learning to Comprehend.
CoRR, 2017

Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

End-to-End Task-Completion Neural Dialogue Systems.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Provably Optimal Algorithms for Generalized Linear Contextual Bandits.
Proceedings of the 34th International Conference on Machine Learning, 2017

Stochastic Variance Reduction Methods for Policy Evaluation.
Proceedings of the 34th International Conference on Machine Learning, 2017

Neuro-Symbolic Program Synthesis.
Proceedings of the 5th International Conference on Learning Representations, 2017

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Online Evaluation for Information Retrieval.
Found. Trends Inf. Retr., 2016

Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking.
CoRR, 2016

Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear.
CoRR, 2016

A User Simulator for Task-Completion Dialogues.
CoRR, 2016

Deep Reinforcement Learning with a Combinatorial Action Space for Predicting and Tracking Popular Discussion Threads.
CoRR, 2016

End-to-End Reinforcement Learning of Dialogue Agents for Information Access.
CoRR, 2016

Click-based Hot Fixes for Underperforming Torso Queries.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

Active Learning with Oracle Epiphany.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives.
Proceedings of the 29th Conference on Learning Theory, 2016

On the Prior Sensitivity of Thompson Sampling.
Proceedings of the Algorithmic Learning Theory - 27th International Conference, 2016

Deep Reinforcement Learning with a Natural Language Action Space.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

2015
Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems.
CoRR, 2015

Recurrent Reinforcement Learning: A Hybrid Approach.
CoRR, 2015

Doubly Robust Off-policy Evaluation for Reinforcement Learning.
CoRR, 2015

Deep Reinforcement Learning with an Unbounded Action Space.
CoRR, 2015

Doubly Robust Policy Evaluation and Optimization.
CoRR, 2015

The Online Discovery Problem and Its Application to Lifelong Reinforcement Learning.
CoRR, 2015

Contextual Bandits with Global Constraints and Objective.
CoRR, 2015

Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study.
Proceedings of the 24th International Conference on World Wide Web Companion, 2015

Toward Predicting the Outcome of an A/B Experiment for Search Relevance.
Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 2015

Offline Evaluation and Optimization for Interactive Systems.
Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 2015

Toward Minimax Off-policy Value Estimation.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2014
Exploiting User Preference for Online Learning in Web Content Optimization Systems.
ACM Trans. Intell. Syst. Technol., 2014

On Minimax Optimal Offline Policy Evaluation.
CoRR, 2014

Counterfactual Estimation and Optimization of Click Metrics for Search Engines.
CoRR, 2014

Temporal supervised learning for inferring a dialog policy from example conversations.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

PAC-inspired Option Discovery in Lifelong Reinforcement Learning.
Proceedings of the 31th International Conference on Machine Learning, 2014

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits.
Proceedings of the 31th International Conference on Machine Learning, 2014

2013
Efficient Online Bootstrapping for Large Scale Learning.
CoRR, 2013

Generalized Thompson Sampling for Contextual Bandits.
CoRR, 2013

Sample Complexity of Multi-task Reinforcement Learning.
Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, 2013

2012
An Online Learning Framework for Refining Recency Search Results with User Click Feedback.
ACM Trans. Inf. Syst., 2012

Bandits with Generalized Linear Models.
Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, 2012

Open Problem: Regret Bounds for Thompson Sampling.
Proceedings of the COLT 2012, 2012

Cloud control: voluntary admission control for intranet traffic management.
Inf. Syst. E Bus. Manag., 2012

Joint relevance and freshness learning from clickthroughs for news search.
Proceedings of the 21st World Wide Web Conference 2012, 2012

Attention and Selection in Online Choice Tasks.
Proceedings of the User Modeling, Adaptation, and Personalization, 2012

Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits.
Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012

Sample Complexity Bounds of Exploration.
Proceedings of the Reinforcement Learning, 2012

2011
Knows what it knows: a framework for self-aware learning.
Mach. Learn., 2011

Contextual Bandits with Linear Payoff Functions.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Contextual Bandit Algorithms with Supervised Learning Guarantees.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Linear-Time Estimators for Propensity Scores.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Refining Recency Search Results with User Click Feedback
CoRR, 2011

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms.
Proceedings of the Forth International Conference on Web Search and Web Data Mining, 2011

An Empirical Evaluation of Thompson Sampling.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Unbiased online active learning in data streams.
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011

Doubly Robust Policy Evaluation and Learning.
Proceedings of the 28th International Conference on Machine Learning, 2011

2010
An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms
CoRR, 2010

An Optimal High Probability Algorithm for the Contextual Bandit Problem
CoRR, 2010

Reducing reinforcement learning to KWIK online regression.
Ann. Math. Artif. Intell., 2010

Maintaining Equilibria During Exploration in Sponsored Search Auctions.
Algorithmica, 2010

A contextual-bandit approach to personalized news article recommendation.
Proceedings of the 19th International Conference on World Wide Web, 2010

Parallelized Stochastic Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Learning from Logged Implicit Exploration Data.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Online learning for recency search ranking using real-time user feedback.
Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010

2009
Reinforcement Learning in Finite MDPs: PAC Analysis.
J. Mach. Learn. Res., 2009

Sparse Online Learning via Truncated Gradient.
J. Mach. Learn. Res., 2009

Provably Efficient Learning with Typed Parametric Models.
J. Mach. Learn. Res., 2009

Learning and planning in environments with delayed feedback.
Auton. Agents Multi Agent Syst., 2009

A Bayesian Sampling Approach to Exploration in Reinforcement Learning.
Proceedings of the UAI 2009, 2009

Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection.
Proceedings of the INTERSPEECH 2009, 2009

Workshop summary: Results of the 2009 reinforcement learning competition.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

The adaptive <i>k</i>-meteorologists problem and its application to structure learning and feature selection in reinforcement learning.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Online exploration in least-squares policy iteration.
Proceedings of the 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), 2009

2008
CORL: A Continuous-state Offset-dynamics Reinforcement Learner.
Proceedings of the UAI 2008, 2008

Efficient Value-Function Approximation via Online Linear Regression.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2008

An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning.
Proceedings of the Machine Learning, 2008

Knows what it knows: a framework for self-aware learning.
Proceedings of the Machine Learning, 2008

A worst-case comparison between temporal difference and residual gradient with linear function approximation.
Proceedings of the Machine Learning, 2008

2007
Focus of Attention in Reinforcement Learning.
J. Univers. Comput. Sci., 2007

Analyzing feature generation for value-function approximation.
Proceedings of the Machine Learning, 2007

Planning and Learning in Environments with Delayed Feedback.
Proceedings of the Machine Learning: ECML 2007, 2007

2006
Incremental Model-based Learners With Formal Learning-Time Guarantees.
Proceedings of the UAI '06, 2006

Towards a Unified Theory of State Abstraction for MDPs.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2006

PAC model-free reinforcement learning.
Proceedings of the Machine Learning, 2006

2005
Lazy Approximation for Solving Continuous Finite-Horizon MDPs.
Proceedings of the Proceedings, 2005

2004
Batch Reinforcement Learning with State Importance.
Proceedings of the Machine Learning: ECML 2004, 2004

2003
Lookahead Pathologies for Single Agent Search.
Proceedings of the IJCAI-03, 2003

Towards Automated Creation of Image Interpretation Systems.
Proceedings of the AI 2003: Advances in Artificial Intelligence, 2003


  Loading...