We stand with Ukraine

We stand with Ukraine

Hengshuai Yao

Orcid: 0000-0003-1258-1845

According to our database¹, Hengshuai Yao authored at least 53 papers between 2006 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Universal Stabilization for Maximum Entropy Optimization in Reinforcement Learning.

[DOI]

,

,

,

,

,

,

IEEE Trans. Neural Networks Learn. Syst., April, 2026

GAIN: Multiplicative Modulation for Domain Adaptation.

[DOI]

,

,

,

CoRR, April, 2026

Why Attend to Everything? Focus is the Key.

[DOI]

,

,

,

,

Yasin Abbasi-Yadkori

,

,

,

,

,

,

CoRR, April, 2026

Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selection.

[DOI]

,

,

,

CoRR, March, 2026

2024

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions.

[DOI]

Shahin Atakishiyev

,

Mohammad Salameh

,

,

IEEE Access, 2024

2023

Careful at Estimation and Bold at Exploration.

[DOI]

,

,

,

,

,

CoRR, 2023

Baird Counterexample Is Solved: with an example of How to Debug a Two-time-scale Algorithm.

[DOI]

CoRR, 2023

A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using L-λ Smoothness.

[DOI]

CoRR, 2023

The Sufficiency of Off-Policyness and Soft Clipping: PPO Is Still Insufficient according to an Off-Policy Measure.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

The Vanishing Decision Boundary Complexity and the Strong First Component.

[DOI]

CoRR, 2022

Class Interference of Deep Neural Networks.

[DOI]

,

,

CoRR, 2022

Sigmoidally Preconditioned Off-policy Learning: a new exploration method for reinforcement learning.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2022

Learning to Accelerate by the Methods of Step-size Planning.

[DOI]

CoRR, 2022

Understanding and mitigating the limitations of prioritized experience replay.

[DOI]

,

,

Amir-massoud Farahmand

,

,

,

,

Proceedings of the Uncertainty in Artificial Intelligence, 2022

2021

A Multi-Component Framework for the Analysis and Design of Explainable Artificial Intelligence.

[DOI]

,

Shahin Atakishiyev

,

Housam Khalifa Bashier Babiker

,

Nawshad Farruque

,

,

Osmar R. Zaïane

,

Mohammad H. Motallebi

,

,

Talat Iqba Syed

,

,

Mach. Learn. Knowl. Extr., 2021

Towards safe, explainable, and regulated autonomous driving.

[DOI]

Shahin Atakishiyev

,

Mohammad Salameh

,

,

CoRR, 2021

Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations.

[DOI]

,

,

,

,

,

CoRR, 2021

Exploring Neural Architecture Search Space via Deep Deterministic Sampling.

[DOI]

,

Mohammad Salameh

,

,

,

Seyed Saeed Changiz Rezaei

,

,

,

,

IEEE Access, 2021

Breaking the Deadly Triad with a Target Network.

[DOI]

Shangtong Zhang

,

,

Shimon Whiteson

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Variance-Reduced Off-Policy Memory-Efficient Policy Search.

[DOI]

,

,

Mohammad Ghavamzadeh

,

,

,

CoRR, 2020

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities.

[DOI]

,

,

,

Amir-massoud Farahmand

,

CoRR, 2020

Towards a practical measure of interference for reinforcement learning.

[DOI]

,

,

,

CoRR, 2020

Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Inputs.

[DOI]

Mennatullah Siam

,

Naren Doraiswamy

,

Boris N. Oreshkin

,

,

Martin Jägersand

CoRR, 2020

Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings.

[DOI]

Mennatullah Siam

,

Naren Doraiswamy

,

Boris N. Oreshkin

,

,

Martin Jägersand

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Mapless Navigation among Dynamics with Social-safety-awareness: a reinforcement learning approach from 2D laser scans.

[DOI]

,

,

,

,

,

Martin Jägersand

Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation.

[DOI]

Shangtong Zhang

,

,

,

Shimon Whiteson

Proceedings of the 37th International Conference on Machine Learning, 2020

2019

One-Shot Weakly Supervised Video Object Segmentation.

[DOI]

Mennatullah Siam

,

Naren Doraiswamy

,

Boris N. Oreshkin

,

,

Martin Jägersand

CoRR, 2019

Provably Convergent Off-Policy Actor-Critic with Function Approximation.

[DOI]

Shangtong Zhang

,

,

,

Shimon Whiteson

CoRR, 2019

Is Fast Adaptation All You Need?

[DOI]

,

,

CoRR, 2019

Distributional Reinforcement Learning for Efficient Exploration.

[DOI]

Borislav Mavrin

,

Shangtong Zhang

,

,

,

,

CoRR, 2019

Reinforcing Classical Planning for Adversary Driving Scenarios.

[DOI]

,

,

CoRR, 2019

Deep Reinforcement Learning with Decorrelation.

[DOI]

Borislav Mavrin

,

,

CoRR, 2019

Hill Climbing on Value Estimates for Search-control in Dyna.

[DOI]

,

,

Amir-massoud Farahmand

,

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Distributional Reinforcement Learning for Efficient Exploration.

[DOI]

Borislav Mavrin

,

,

,

,

Proceedings of the 36th International Conference on Machine Learning, 2019

M-estimation in Low-Rank Matrix Factorization: A General Framework.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

Exploration in the Face of Parametric and Intrinsic Uncertainties.

[DOI]

Borislav Mavrin

,

Shangtong Zhang

,

,

Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

QUOTA: The Quantile Option Architecture for Reinforcement Learning.

[DOI]

Shangtong Zhang

,

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search.

[DOI]

Shangtong Zhang

,

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search.

[DOI]

Shangtong Zhang

,

,

CoRR, 2018

QUOTA: The Quantile Option Architecture for Reinforcement Learning.

[DOI]

Shangtong Zhang

,

Borislav Mavrin

,

,

,

CoRR, 2018

Negative Log Likelihood Ratio Loss for Deep Neural Network Classification.

[DOI]

,

,

,

CoRR, 2018

Practical Issues of Action-Conditioned Next Image Prediction.

[DOI]

,

,

,

Masoud S. Nosrati

,

Peyman Yadmellat

,

Proceedings of the 21st International Conference on Intelligent Transportation Systems, 2018

2014

Learning to predict trending queries: classification - based.

[DOI]

,

,

,

,

,

Proceedings of the 23rd International World Wide Web Conference, 2014

Universal Option Models.

[DOI]

,

Csaba Szepesvári

,

Richard S. Sutton

,

,

Shalabh Bhatnagar

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Pseudo-MDPs and factored linear action models.

[DOI]

,

Csaba Szepesvári

,

Bernardo Ávila Pires

,

Proceedings of the 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2014

2013

Reinforcement Ranking

[DOI]

,

Dale Schuurmans

CoRR, 2013

2012

Discovering and Leveraging the Most Valuable Links for Ranking

[DOI]

CoRR, 2012

Approximate Policy Iteration with Linear Action Models.

[DOI]

,

Csaba Szepesvári

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

2009

Multi-Step Dyna Planning for Policy Evaluation and Control.

[DOI]

,

Richard S. Sutton

,

Shalabh Bhatnagar

,

,

Csaba Szepesvári

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS.

[DOI]

,

Shalabh Bhatnagar

,

Csaba Szepesvári

Proceedings of the 48th IEEE Conference on Decision and Control, 2009

2008

Minimal Residual Approaches for Policy Evaluation in Large Sparse Markov Chains.

[DOI]

,

Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2008

Preconditioned temporal difference learning.

[DOI]

,

Proceedings of the Machine Learning, 2008

2006

Historical Temporal Difference Learning: Some Initial Results.

[DOI]

,

,

Proceedings of the Interdisciplinary and Multidisciplinary Research in Computer Science, 2006

Loading...