Hengshuai Yao

Orcid: 0000-0003-1258-1845

According to our database1, Hengshuai Yao authored at least 49 papers between 2006 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Careful at Estimation and Bold at Exploration.
CoRR, 2023

Baird Counterexample Is Solved: with an example of How to Debug a Two-time-scale Algorithm.
CoRR, 2023

A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using L-λ Smoothness.
CoRR, 2023

The Sufficiency of Off-Policyness and Soft Clipping: PPO Is Still Insufficient according to an Off-Policy Measure.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
The Vanishing Decision Boundary Complexity and the Strong First Component.
CoRR, 2022

Class Interference of Deep Neural Networks.
CoRR, 2022

Sigmoidally Preconditioned Off-policy Learning: a new exploration method for reinforcement learning.
CoRR, 2022

Learning to Accelerate by the Methods of Step-size Planning.
CoRR, 2022

Understanding and mitigating the limitations of prioritized experience replay.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

2021
A Multi-Component Framework for the Analysis and Design of Explainable Artificial Intelligence.
Mach. Learn. Knowl. Extr., 2021

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions.
CoRR, 2021

Towards safe, explainable, and regulated autonomous driving.
CoRR, 2021

Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations.
CoRR, 2021

Exploring Neural Architecture Search Space via Deep Deterministic Sampling.
IEEE Access, 2021

Breaking the Deadly Triad with a Target Network.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
Variance-Reduced Off-Policy Memory-Efficient Policy Search.
CoRR, 2020

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities.
CoRR, 2020

Towards a practical measure of interference for reinforcement learning.
CoRR, 2020

Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Inputs.
CoRR, 2020

Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Mapless Navigation among Dynamics with Social-safety-awareness: a reinforcement learning approach from 2D laser scans.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation.
Proceedings of the 37th International Conference on Machine Learning, 2020

2019
One-Shot Weakly Supervised Video Object Segmentation.
CoRR, 2019

Provably Convergent Off-Policy Actor-Critic with Function Approximation.
CoRR, 2019

Is Fast Adaptation All You Need?
CoRR, 2019

Distributional Reinforcement Learning for Efficient Exploration.
CoRR, 2019

Reinforcing Classical Planning for Adversary Driving Scenarios.
CoRR, 2019

Deep Reinforcement Learning with Decorrelation.
CoRR, 2019

Hill Climbing on Value Estimates for Search-control in Dyna.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Distributional Reinforcement Learning for Efficient Exploration.
Proceedings of the 36th International Conference on Machine Learning, 2019

M-estimation in Low-Rank Matrix Factorization: A General Framework.
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

Exploration in the Face of Parametric and Intrinsic Uncertainties.
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

QUOTA: The Quantile Option Architecture for Reinforcement Learning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search.
CoRR, 2018

QUOTA: The Quantile Option Architecture for Reinforcement Learning.
CoRR, 2018

Negative Log Likelihood Ratio Loss for Deep Neural Network Classification.
CoRR, 2018

Practical Issues of Action-Conditioned Next Image Prediction.
Proceedings of the 21st International Conference on Intelligent Transportation Systems, 2018

2014
Learning to predict trending queries: classification - based.
Proceedings of the 23rd International World Wide Web Conference, 2014

Universal Option Models.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Pseudo-MDPs and factored linear action models.
Proceedings of the 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2014

2013
Reinforcement Ranking
CoRR, 2013

2012
Discovering and Leveraging the Most Valuable Links for Ranking
CoRR, 2012

Approximate Policy Iteration with Linear Action Models.
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

2009
Multi-Step Dyna Planning for Policy Evaluation and Control.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS.
Proceedings of the 48th IEEE Conference on Decision and Control, 2009

2008
Minimal Residual Approaches for Policy Evaluation in Large Sparse Markov Chains.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2008

Preconditioned temporal difference learning.
Proceedings of the Machine Learning, 2008

2006
Historical Temporal Difference Learning: Some Initial Results.
Proceedings of the Interdisciplinary and Multidisciplinary Research in Computer Science, 2006


  Loading...