Heyang Zhao

Orcid: 0009-0004-6912-0040

According to our database1, Heyang Zhao authored at least 22 papers between 2021 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability.
CoRR, May, 2026

On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization.
CoRR, May, 2026

Near-Optimal Regret for KL-Regularized Multi-Armed Bandits.
CoRR, March, 2026

2025
Best-of-Majority: Minimax-Optimal Strategy for Pass@<i>k</i> Inference Scaling.
CoRR, October, 2025

Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability.
CoRR, February, 2025

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Logarithmic Regret for Online KL-Regularized Reinforcement Learning.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A 0.16mm<sup>2</sup>450MHz-BW 72dB-SNDR Continuous-time Pipeline ADC with APF+HPF and APF+FIR Hybrid Delay Alignment Techniques.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2025

CGCA-KAN: Correction-Guided Cluster-Aware Attention and KAN Enhanced Architecture for Medical Image Segmentation.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2025

2024
A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Feel-Good Thompson Sampling for Contextual Dueling Bandits.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Variance-aware Regret Bounds for Stochastic Contextual Dueling Bandits.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

S-lemma On Three Non-homogeneous Quadratic Functions And Its Applications In Signal Recovery.
Proceedings of the 10th International Conference on Communication and Information Processing, 2024

2023
Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits.
Proceedings of the International Conference on Machine Learning, 2023

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes.
Proceedings of the International Conference on Machine Learning, 2023

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022
Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds.
CoRR, 2022

VinciDecoder: Automatically Interpreting Provenance Graphs into Textual Forensic Reports with Application to OpenStack.
Proceedings of the Secure IT Systems, 2022

ProvTalk: Towards Interpretable Multi-level Provenance Analysis in Networking Functions Virtualization (NFV).
Proceedings of the 29th Annual Network and Distributed System Security Symposium, 2022

2021
Linear Contextual Bandits with Adversarial Corruptions.
CoRR, 2021


  Loading...