On Linear Programming for Constrained and Unconstrained Average-Cost Markov Decision Processes with Countable Action Spaces and Strictly Unbounded Costs.

[BibT_eX]

[DOI]

Huizhen Yu

Math. Oper. Res., 2022

2020

Average Cost Optimality Inequality for Markov Decision Processes with Borel Spaces and Universally Measurable Policies.

[BibT_eX]

[DOI]

Huizhen Yu

SIAM J. Control. Optim., 2020

On the Minimum Pair Approach for Average Cost Markov Decision Processes with Countable Discrete Action Spaces and Strictly Unbounded Costs.

[BibT_eX]

[DOI]

Huizhen Yu

SIAM J. Control. Optim., 2020

Research on the Structural Impact of the Disappearance of China's Demographic Dividend on the Education Industry.

[BibT_eX]

[DOI]

Huizhen Yu

Proceedings of the ICETM 2020: 3rd International Conference on Education Technology Management, 2020

2018

Two geometric input transformation methods for fast online reinforcement learning with neural nets.

[BibT_eX]

[DOI]

CoRR, 2018

2017

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning.

[BibT_eX]

[DOI]

Huizhen Yu

CoRR, 2017

Multi-step Off-policy Learning Without Importance Sampling Ratios.

[BibT_eX]

[DOI]

Ashique Rupam Mahmood

Huizhen Yu

Richard S. Sutton

CoRR, 2017

On Generalized Bellman Equations and Temporal-Difference Learning.

[BibT_eX]

[DOI]

Huizhen Yu

Ashique Rupam Mahmood

Richard S. Sutton

Proceedings of the Advances in Artificial Intelligence, 2017

2016

Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize.

[BibT_eX]

[DOI]

Huizhen Yu

J. Mach. Learn. Res., 2016

Some Simulation Results for Emphatic Temporal-Difference Learning Algorithms.

[BibT_eX]

[DOI]

Huizhen Yu

CoRR, 2016

2015

On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes.

[BibT_eX]

[DOI]

Huizhen Yu

SIAM J. Control. Optim., 2015

A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies.

[BibT_eX]

[DOI]

Huizhen Yu

Dimitri P. Bertsekas

Math. Oper. Res., 2015

Emphatic Temporal-Difference Learning.

[BibT_eX]

[DOI]

Ashique Rupam Mahmood

Huizhen Yu

Martha White

Richard S. Sutton

CoRR, 2015

On Convergence of Emphatic Temporal-Difference Learning.

[BibT_eX]

[DOI]

Huizhen Yu

Proceedings of The 28th Conference on Learning Theory, 2015

2013

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems.

[BibT_eX]

[DOI]

Huizhen Yu

Dimitri P. Bertsekas

Math. Oper. Res., 2013

Q-learning and policy iteration algorithms for stochastic shortest path problems.

[BibT_eX]

[DOI]

Huizhen Yu

Dimitri P. Bertsekas

Ann. Oper. Res., 2013

2012

Least Squares Temporal Difference Methods: An Analysis under General Conditions.

[BibT_eX]

[DOI]

Huizhen Yu

SIAM J. Control. Optim., 2012

2011

A Unifying Polyhedral Approximation Framework for Convex Optimization.

[BibT_eX]

[DOI]

Dimitri P. Bertsekas

Huizhen Yu

SIAM J. Optim., 2011

2010

Error Bounds for Approximations from Projected Linear Equations.

[BibT_eX]

[DOI]

Huizhen Yu

Dimitri P. Bertsekas

Math. Oper. Res., 2010

Convergence of Least Squares Temporal Difference Methods Under General Conditions.

[BibT_eX]

[DOI]

Huizhen Yu

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Q-learning and enhanced policy iteration in discounted dynamic programming.

[BibT_eX]

[DOI]

Dimitri P. Bertsekas

Huizhen Yu

Proceedings of the 49th IEEE Conference on Decision and Control, 2010

Distributed asynchronous policy iteration in dynamic programming.

[BibT_eX]

[DOI]

Dimitri P. Bertsekas

Huizhen Yu

Proceedings of the 48th Annual Allerton Conference on Communication, 2010

2009

Convergence Results for Some Temporal Difference Methods Based on Least Squares.

[BibT_eX]

[DOI]

Huizhen Yu

Dimitri P. Bertsekas

IEEE Trans. Autom. Control., 2009

Basis function adaptation methods for cost approximation in MDP.

[BibT_eX]

[DOI]

Huizhen Yu

Dimitri P. Bertsekas

Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009

2008

On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP.

[BibT_eX]

[DOI]

Huizhen Yu

Dimitri P. Bertsekas

Math. Oper. Res., 2008

Efficient Discriminative Training Method for Structured Predictions.

[BibT_eX]

[DOI]

Huizhen Yu

Dimitri P. Bertsekas

Juho Rousu

Proceedings of the 6th Workshop on Mining and Learning with Graphs, 2008

New error bounds for approximations from projected linear equations.

[BibT_eX]

[DOI]

Huizhen Yu

Dimitri P. Bertsekas

Proceedings of the 46th Annual Allerton Conference on Communication, 2008

2006

Approximate solution methods for POMDP and POSMDP.

[BibT_eX]

[DOI]

Huizhen Yu

PhD thesis, 2006

2005

A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies.

[BibT_eX]

[DOI]

Huizhen Yu

Proceedings of the UAI '05, 2005

2004

Discretized Approximations for POMDP with Average Cost.

[BibT_eX]

[DOI]

Huizhen Yu

Dimitri P. Bertsekas

Proceedings of the UAI '04, 2004

2001

Combining Configurational and Statistical Approaches in Image Retrieval.

[BibT_eX]

[DOI]

Huizhen Yu

W. Eric L. Grimson

Proceedings of the Advances in Multimedia Information Processing, 2001

Huizhen Yu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...