Lihong Li

According to our database1, Lihong Li authored at least 83 papers between 2003 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepage:

On csauthors.net:

Bibliography

2019
A perspective on off-policy evaluation in reinforcement learning.
Frontiers Comput. Sci., 2019

Policy Certificates: Towards Accountable Reinforcement Learning.
Proceedings of the 36th International Conference on Machine Learning, 2019

Neural Logic Machines.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Neural Approaches to Conversational AI.
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Adversarial Attacks on Stochastic Bandits.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation.
Proceedings of the 35th International Conference on Machine Learning, 2018

Scalable Bilinear Learning Using State and Action Features.
Proceedings of the 35th International Conference on Machine Learning, 2018

Boosting the Actor with Dual Critic.
Proceedings of the 6th International Conference on Learning Representations, 2018

Data Poisoning Attacks in Contextual Bandits.
Proceedings of the Decision and Game Theory for Security - 9th International Conference, 2018

Subgoal Discovery for Hierarchical Dialogue Policy Learning.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Neural Approaches to Conversational AI.
Proceedings of ACL 2018, Melbourne, Australia, July 15-20, 2018, Tutorial Abstracts, 2018

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

End-to-End Task-Completion Neural Dialogue Systems.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Provably Optimal Algorithms for Generalized Linear Contextual Bandits.
Proceedings of the 34th International Conference on Machine Learning, 2017

Stochastic Variance Reduction Methods for Policy Evaluation.
Proceedings of the 34th International Conference on Machine Learning, 2017

Neuro-Symbolic Program Synthesis.
Proceedings of the 5th International Conference on Learning Representations, 2017

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Online Evaluation for Information Retrieval.
Foundations and Trends in Information Retrieval, 2016

Click-based Hot Fixes for Underperforming Torso Queries.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

Active Learning with Oracle Epiphany.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives.
Proceedings of the 29th Conference on Learning Theory, 2016

On the Prior Sensitivity of Thompson Sampling.
Proceedings of the Algorithmic Learning Theory - 27th International Conference, 2016

Deep Reinforcement Learning with a Natural Language Action Space.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

2015
Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study.
Proceedings of the 24th International Conference on World Wide Web Companion, 2015

Toward Predicting the Outcome of an A/B Experiment for Search Relevance.
Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 2015

Offline Evaluation and Optimization for Interactive Systems.
Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 2015

Toward Minimax Off-policy Value Estimation.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2014
Exploiting User Preference for Online Learning in Web Content Optimization Systems.
ACM TIST, 2014

Temporal supervised learning for inferring a dialog policy from example conversations.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

PAC-inspired Option Discovery in Lifelong Reinforcement Learning.
Proceedings of the 31th International Conference on Machine Learning, 2014

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits.
Proceedings of the 31th International Conference on Machine Learning, 2014

2013
Sample Complexity of Multi-task Reinforcement Learning.
Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, 2013

2012
An Online Learning Framework for Refining Recency Search Results with User Click Feedback.
ACM Trans. Inf. Syst., 2012

Bandits with Generalized Linear Models.
Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, 2012

Open Problem: Regret Bounds for Thompson Sampling.
Proceedings of the COLT 2012, 2012

Cloud control: voluntary admission control for intranet traffic management.
Inf. Syst. E-Business Management, 2012

Joint relevance and freshness learning from clickthroughs for news search.
Proceedings of the 21st World Wide Web Conference 2012, 2012

Attention and Selection in Online Choice Tasks.
Proceedings of the User Modeling, Adaptation, and Personalization, 2012

Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits.
Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012

2011
Knows what it knows: a framework for self-aware learning.
Machine Learning, 2011

Contextual Bandits with Linear Payoff Functions.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Contextual Bandit Algorithms with Supervised Learning Guarantees.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Linear-Time Estimators for Propensity Scores.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms.
Proceedings of the Forth International Conference on Web Search and Web Data Mining, 2011

An Empirical Evaluation of Thompson Sampling.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Unbiased online active learning in data streams.
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011

Doubly Robust Policy Evaluation and Learning.
Proceedings of the 28th International Conference on Machine Learning, 2011

2010
Reducing reinforcement learning to KWIK online regression.
Ann. Math. Artif. Intell., 2010

Maintaining Equilibria During Exploration in Sponsored Search Auctions.
Algorithmica, 2010

A contextual-bandit approach to personalized news article recommendation.
Proceedings of the 19th International Conference on World Wide Web, 2010

Parallelized Stochastic Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Learning from Logged Implicit Exploration Data.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Online learning for recency search ranking using real-time user feedback.
Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010

2009
Reinforcement Learning in Finite MDPs: PAC Analysis.
J. Mach. Learn. Res., 2009

Provably Efficient Learning with Typed Parametric Models.
J. Mach. Learn. Res., 2009

Learning and planning in environments with delayed feedback.
Autonomous Agents and Multi-Agent Systems, 2009

A Bayesian Sampling Approach to Exploration in Reinforcement Learning.
Proceedings of the UAI 2009, 2009

Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection.
Proceedings of the INTERSPEECH 2009, 2009

Workshop summary: Results of the 2009 reinforcement learning competition.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Online exploration in least-squares policy iteration.
Proceedings of the 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), 2009

2008
CORL: A Continuous-state Offset-dynamics Reinforcement Learner.
Proceedings of the UAI 2008, 2008

Sparse Online Learning via Truncated Gradient.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Efficient Value-Function Approximation via Online Linear Regression.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2008

An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning.
Proceedings of the Machine Learning, 2008

Knows what it knows: a framework for self-aware learning.
Proceedings of the Machine Learning, 2008

A worst-case comparison between temporal difference and residual gradient with linear function approximation.
Proceedings of the Machine Learning, 2008

2007
Focus of Attention in Reinforcement Learning.
J. UCS, 2007

Maintaining Equilibria During Exploration in Sponsored Search Auctions.
Proceedings of the Internet and Network Economics, Third International Workshop, 2007

Analyzing feature generation for value-function approximation.
Proceedings of the Machine Learning, 2007

Planning and Learning in Environments with Delayed Feedback.
Proceedings of the Machine Learning: ECML 2007, 2007

2006
Incremental Model-based Learners With Formal Learning-Time Guarantees.
Proceedings of the UAI '06, 2006

Towards a Unified Theory of State Abstraction for MDPs.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2006

PAC model-free reinforcement learning.
Proceedings of the Machine Learning, 2006

2005
Lazy Approximation for Solving Continuous Finite-Horizon MDPs.
Proceedings of the Proceedings, 2005

2004
Batch Reinforcement Learning with State Importance.
Proceedings of the Machine Learning: ECML 2004, 2004

2003
Lookahead Pathologies for Single Agent Search.
Proceedings of the IJCAI-03, 2003

Towards Automated Creation of Image Interpretation Systems.
Proceedings of the AI 2003: Advances in Artificial Intelligence, 2003


  Loading...