Tianhao Wu

Affiliations:

University of California, Berkeley, CA, USA (PhD 2021)
Peking University, School of Mathematical Sciences, Beijing, China

According to our database¹, Tianhao Wu authored at least 17 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2025

Sample Complexity and Representation Ability of Test-time Scaling Paradigms.

[BibT_eX]

[DOI]

CoRR, June, 2025

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback.

[BibT_eX]

[DOI]

CoRR, January, 2025

R.I.P.: Better Models by Survival of the Fittest Prompts.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Thinking LLMs: General Instruction Following with Thought Generation.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

EmbedLLM: Learning Compact Representations of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

RouteLLM: Learning to Route LLMs from Preference Data.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

RouteLLM: Learning to Route LLMs with Preference Data.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment.

[BibT_eX]

[DOI]

CoRR, 2023

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Statistical Inference on Multi-armed Bandits with Delayed Feedback.

[BibT_eX]

[DOI]

Lei Shi

Jingshen Wang

Tianhao Wu

Proceedings of the International Conference on Machine Learning, 2023

2022

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

A Unified Framework for Conservative Exploration.

[BibT_eX]

[DOI]

CoRR, 2021

On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Tianhao Wu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...