Huazheng Wang

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Mitigating Object Hallucination in Large Vision-Language Models via Visual Attention Direct Preference Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Unveiling Internal Reasoning Modes in LLMs: A Deep Dive into Latent Reasoning vs. Factual Shortcuts with Attribute Rate Ratio.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Divide, Optimize, Merge: Scalable Fine-Grained Generative Optimization for LLM Agents.

[BibT_eX]

[DOI]

Malte Højmark-Bertelsen

Marie Normann Gadeberg

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

The Threat of PROMPTS in Large Language Models: A System and User Prompt Perspective.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Efficient and Robust Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

FCOM: A Federated Collaborative Online Monitoring Framework via Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Adversarial Attacks on Online Learning to Rank with Stochastic Click Models.

[BibT_eX]

[DOI]

Zichen Wang

Rishab Balasubramanian

Trans. Mach. Learn. Res., 2024

Memory-Augmented Agent Training for Business Document Understanding.

[BibT_eX]

[DOI]

Jiale Liu

Yifan Zeng

Malte Højmark-Bertelsen

Marie Normann Gadeberg

Jose Efraim Aguilar Escamill

CoRR, 2024

RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning.

[BibT_eX]

[DOI]

Yujie Zhao

Weyl Lu

Jose E. Aguilar Escamilla

CoRR, 2024

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling.

[BibT_eX]

[DOI]

CoRR, 2024

Contractual Reinforcement Learning: Pulling Arms with Invisible Hands.

[BibT_eX]

[DOI]

CoRR, 2024

Hard Work Does Not Always Pay Off: Poisoning Attacks on Neural Architecture Search.

[BibT_eX]

[DOI]

CoRR, 2024

AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks.

[BibT_eX]

[DOI]

CoRR, 2024

Pure Exploration in Asynchronous Federated Bandits.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2024

RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning.

[BibT_eX]

[DOI]

Yujie Zhao

Weyl Lu

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MDR: Model-Specific Demonstration Retrieval at Inference Time for In-Context Learning.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Video Anomaly Detection via Progressive Learning of Multiple Proxy Tasks.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Conversational Dueling Bandits in Generalized Linear Models.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Adversarial Attacks on Combinatorial Multi-Armed Bandits.

[BibT_eX]

[DOI]

Rishab Balasubramanian

Proceedings of the Forty-first International Conference on Machine Learning, 2024

PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SSS: Editing Factual Knowledge in Language Models towards Semantic Sparse Space.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits.

[BibT_eX]

[DOI]

Zhiwei Wang

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Multi-Agent Join.

[BibT_eX]

[DOI]

CoRR, 2023

Aligning Agent Policy with Externalities: Reward Design via Bilevel RL.

[BibT_eX]

[DOI]

CoRR, 2023

Online Modeling and Monitoring of Dependent Processes under Resource Constraints.

[BibT_eX]

[DOI]

Tanapol Kosolwattana

Ying Lin

CoRR, 2023

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems.

[BibT_eX]

[DOI]

CoRR, 2023

Machine Learning for Synthetic Data Generation: a Review.

[BibT_eX]

[DOI]

Yingzhou Lu

Wenqi Wei

CoRR, 2023

Incentivizing Exploration in Linear Contextual Bandits under Information Gap.

[BibT_eX]

[DOI]

Proceedings of the 17th ACM Conference on Recommender Systems, 2023

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective.

[BibT_eX]

[DOI]

Rishab Balasubramanian

Mengdi Wang

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Learning Kernelized Contextual Bandits in a Distributed and Asynchronous Environment.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data?

[BibT_eX]

[DOI]

Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

2022

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization.

[BibT_eX]

[DOI]

CoRR, 2022

Dynamic Global Sensitivity for Differentially Private Contextual Bandits.

[BibT_eX]

[DOI]

David Zhao

Proceedings of the RecSys '22: Sixteenth ACM Conference on Recommender Systems, Seattle, WA, USA, September 18, 2022

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Communication Efficient Distributed Learning for Kernelized Contextual Bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

When Are Linear Stochastic Bandits Attackable?

[BibT_eX]

[DOI]

Haifeng Xu

Proceedings of the International Conference on Machine Learning, 2022

2021

Unbiased Learning to Rank: Online or Offline?

[BibT_eX]

[DOI]

ACM Trans. Inf. Syst., 2021

Incentivizing Exploration in Linear Bandits under Information Gap.

[BibT_eX]

[DOI]

CoRR, 2021

PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer.

[BibT_eX]

[DOI]

Proceedings of the WWW '21: The Web Conference 2021, 2021

Interactive Information Retrieval with Bandit Feedback.

[BibT_eX]

[DOI]

Yiling Jia

Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

2020

A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual Bandit Problem.

[BibT_eX]

[DOI]

CoRR, 2020

Global and Local Differential Privacy for Collaborative Bandits.

[BibT_eX]

[DOI]

Proceedings of the RecSys 2020: Fourteenth ACM Conference on Recommender Systems, 2020

Learning by Exploration: New Challenges in Real-World Environments.

[BibT_eX]

[DOI]