Adith Swaminathan

Nathan Kallus

CoRR, May, 2026

Understanding the Challenges in Iterative Generative Optimization with LLMs.

[BibT_eX]

[DOI]

CoRR, March, 2026

Reasoning about Reasoning: BAPO Bounds on Chain-of-Thought Token Complexity in LLMs.

[BibT_eX]

[DOI]

CoRR, February, 2026

A Course Correction in Steerability Evaluation: Revealing Miscalibration and Side Effects in LLMs.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Provably Learning from Language Feedback.

[BibT_eX]

[DOI]

CoRR, June, 2025

Lost in Transmission: When and Why LLMs Fail to Reason Globally.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

2024

Combining Open-box Simulation and Importance Sampling for Tuning Large-Scale Recommenders.

[BibT_eX]

[DOI]

CoRR, 2024

Trace is the New AutoDiff - Unlocking Efficient Optimization of Computational Workflows.

[BibT_eX]

[DOI]

Allen Nie

CoRR, 2024

The Importance of Directional Feedback for LLM-based Optimizers.

[BibT_eX]

[DOI]

CoRR, 2024

AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks.

[BibT_eX]

[DOI]

CoRR, 2024

On Overcoming Miscalibrated Conversational Priors in LLM-based ChatBots.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2024

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs.

[BibT_eX]

[DOI]

Allen Nie

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

LLF-Bench: Benchmark for Interactive Learning from Language Feedback.

[BibT_eX]

[DOI]

CoRR, 2023

Interactive Robot Learning from Verbal Correction.

[BibT_eX]

[DOI]

CoRR, 2023

Hindsight Learning for MDPs with Exogenous Inputs.

[BibT_eX]

[DOI]

Sean R. Sinclair

Felipe Vieira Frujeri

Hugo de Oliveira Barbalho

Luke Marshall

Proceedings of the International Conference on Machine Learning, 2023

2022

Hindsight Learning for MDPs with Exogenous Inputs.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL.

[BibT_eX]

[DOI]

CoRR, 2021

Recommendations as Treatments.

[BibT_eX]

[DOI]

AI Mag., 2021

Heuristic-Guided Reinforcement Learning.

[BibT_eX]

[DOI]

Andrey Kolobov

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020

Provably Good Batch Reinforcement Learning Without Great Exploration.

[BibT_eX]

[DOI]

CoRR, 2020

Improved Image Wasserstein Attacks and Defenses.

[BibT_eX]

[DOI]

CoRR, 2020

Active Learning for ML Enhanced Database Systems.

[BibT_eX]

[DOI]

Proceedings of the 2020 International Conference on Management of Data, 2020

REVEAL 2020: Bandit and Reinforcement Learning from User Interactions.

[BibT_eX]

[DOI]

Proceedings of the RecSys 2020: Fourteenth ACM Conference on Recommender Systems, 2020

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Learning Calibratable Policies using Programmatic Style-Consistency.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Working Memory Graphs.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Off-Policy Policy Gradient with State Distribution Correction.

[BibT_eX]

[DOI]

CoRR, 2019

Multi-Preference Actor Critic.

[BibT_eX]

[DOI]

Ishan Durugkar

Patrick MacAlpine

CoRR, 2019

NAIL: A General Interactive Fiction Agent.

[BibT_eX]

[DOI]

CoRR, 2019

Off-Policy Policy Gradient with Stationary Distribution Correction.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

REVEAL 2019: closing the loop with the real world: reinforcement and robust estimators for recommendation.

[BibT_eX]

[DOI]

Proceedings of the 13th ACM Conference on Recommender Systems, 2019

A Distillation Approach to Data Efficient Individual Treatment Effect Estimation.

[BibT_eX]

[DOI]

Maggie Makar

Emre Kiciman

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

REVEAL 2018: offline evaluation for recommender systems.

[BibT_eX]

[DOI]

Proceedings of the 12th ACM Conference on Recommender Systems, 2018

Deep Learning with Logged Bandit Feedback.

[BibT_eX]

[DOI]

Maarten de Rijke

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Counterfactual evaluation and learning from logged user feedback.

[BibT_eX]

PhD thesis, 2017

Unbiased Learning-to-Rank with Biased Feedback.

[BibT_eX]

[DOI]

Tobias Schnabel

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017

Off-policy evaluation for slate recommendation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

2016

Large-scale Validation of Counterfactual Learning Methods: A Test-Bed.

[BibT_eX]

[DOI]

CoRR, 2016

Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement.

[BibT_eX]

[DOI]

Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

Unbiased Comparative Evaluation of Ranking Functions.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval, 2016

Recommendations as Treatments: Debiasing Learning and Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 33nd International Conference on Machine Learning, 2016

2015

Batch learning from logged bandit feedback through counterfactual risk minimization.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2015

Counterfactual Risk Minimization.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on World Wide Web Companion, 2015

Unbiased Ranking Evaluation on a Budget.

[BibT_eX]

[DOI]

Tobias Schnabel

Proceedings of the 24th International Conference on World Wide Web Companion, 2015

The Self-Normalized Estimator for Counterfactual Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Machine Learning, 2015

2014

Mining Videos from the Web for Electronic Textbooks.

[BibT_eX]

[DOI]

Krishnaram Kenthapadi