We stand with Ukraine

We stand with Ukraine

Philip S. Thomas

Orcid: 0000-0002-9904-1800

Affiliations:

University of Massachusetts Amherst, Department of Computer Science

According to our database¹, Philip S. Thomas authored at least 75 papers between 2009 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on psthomas.com

On csauthors.net:

Bibliography

2024

ICU-Sepsis: A Benchmark MDP Built from Real Medical Data.

[BibT_eX]

[DOI]

Kartik Choudhary

,

,

Philip S. Thomas

CoRR, 2024

Position: Benchmarking is Limited in Reinforcement Learning Research.

[BibT_eX]

[DOI]

Scott M. Jordan

,

,

Bruno Castro da Silva

,

,

Philip S. Thomas

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Analyzing the Relationship Between Difference and Ratio-Based Fairness Metrics.

[BibT_eX]

[DOI]

,

Blossom Metevier

,

,

Philip S. Thomas

Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024

From Past to Future: Rethinking Eligibility Traces.

[BibT_eX]

[DOI]

,

Scott M. Jordan

,

Shreyas Chaudhari

,

,

Philip S. Thomas

,

Bruno Castro da Silva

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Learning Fair Representations with High-Confidence Guarantees.

[BibT_eX]

[DOI]

,

,

Philip S. Thomas

CoRR, 2023

Coagent Networks: Generalized and Scaled.

[BibT_eX]

[DOI]

James E. Kostas

,

Scott M. Jordan

,

,

Georgios Theocharous

,

,

,

Bruno Castro da Silva

,

Philip S. Thomas

CoRR, 2023

Optimization using Parallel Gradient Evaluations on Multiple Parameters.

[BibT_eX]

[DOI]

,

,

Venkata Gandikota

,

Philip S. Thomas

,

CoRR, 2023

Behavior Alignment via Reward Function Optimization.

[BibT_eX]

[DOI]

,

,

Scott M. Jordan

,

Philip S. Thomas

,

Bruno C. da Silva

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Seldonian Toolkit: Building Software with Safe and Fair Machine Learning.

[BibT_eX]

[DOI]

,

James E. Kostas

,

Bruno Castro da Silva

,

Philip S. Thomas

,

Proceedings of the 45th IEEE/ACM International Conference on Software Engineering: ICSE 2023 Companion Proceedings, 2023

Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments.

[BibT_eX]

[DOI]

,

,

Philip S. Thomas

,

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022

Enforcing Delayed-Impact Fairness Guarantees.

[BibT_eX]

[DOI]

,

Blossom Metevier

,

,

Philip S. Thomas

,

Bruno Castro da Silva

CoRR, 2022

Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL.

[BibT_eX]

[DOI]

,

Philip S. Thomas

,

Shlomo Zilberstein

CoRR, 2022

Off-Policy Evaluation for Action-Dependent Non-stationary Environments.

[BibT_eX]

[DOI]

,

,

Nathaniel D. Bastian

,

Bruno C. da Silva

,

,

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Mechanizing Soundness of Off-Policy Evaluation.

[BibT_eX]

[DOI]

,

J. Eliot B. Moss

,

Michael Norrish

,

Philip S. Thomas

Proceedings of the 13th International Conference on Interactive Theorem Proving, 2022

Fairness Guarantees under Demographic Shift.

[BibT_eX]

[DOI]

Stephen Giguere

,

Blossom Metevier

,

Bruno Castro da Silva

,

,

Philip S. Thomas

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Edge-Compatible Reinforcement Learning for Recommendations.

[BibT_eX]

[DOI]

James E. Kostas

,

Philip S. Thomas

,

Georgios Theocharous

CoRR, 2021

Large-scale Interactive Conversational Recommendation System using Actor-Critic Framework.

[BibT_eX]

[DOI]

Ali Montazeralghaem

,

,

Philip S. Thomas

Proceedings of the RecSys '21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 27 September 2021, 2021

SOPE: Spectrum of Off-Policy Estimators.

[BibT_eX]

[DOI]

Christina J. Yuan

,

,

Stephen Giguere

,

Philip S. Thomas

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs.

[BibT_eX]

[DOI]

,

Philip S. Thomas

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Structural Credit Assignment in Neural Networks using Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

Matthew Schlegel

,

James E. Kostas

,

Philip S. Thomas

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Universal Off-Policy Evaluation.

[BibT_eX]

[DOI]

,

,

Bruno C. da Silva

,

Erik G. Learned-Miller

,

,

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Practical Mean Bounds for Small Samples.

[BibT_eX]

[DOI]

,

Philip S. Thomas

,

Erik G. Learned-Miller

Proceedings of the 38th International Conference on Machine Learning, 2021

Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods.

[BibT_eX]

[DOI]

,

Philip S. Thomas

,

Bruno C. da Silva

Proceedings of the 38th International Conference on Machine Learning, 2021

High Confidence Generalization for Reinforcement Learning.

[BibT_eX]

[DOI]

James E. Kostas

,

,

Scott M. Jordan

,

Georgios Theocharous

,

Philip S. Thomas

Proceedings of the 38th International Conference on Machine Learning, 2021

High-Confidence Off-Policy (or Counterfactual) Variance Estimation.

[BibT_eX]

[DOI]

,

,

Philip S. Thomas

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Reinforcement Learning for Strategic Recommendations.

[BibT_eX]

[DOI]

Georgios Theocharous

,

,

Philip S. Thomas

,

CoRR, 2020

Learning Reusable Options for Multi-Task Reinforcement Learning.

[BibT_eX]

[DOI]

Francisco M. Garcia

,

,

Philip S. Thomas

CoRR, 2020

Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms.

[BibT_eX]

[DOI]

,

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Towards Safe Policy Improvement for Non-Stationary MDPs.

[BibT_eX]

[DOI]

,

Scott M. Jordan

,

Georgios Theocharous

,

,

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Asynchronous Coagent Networks.

[BibT_eX]

[DOI]

James E. Kostas

,

,

Philip S. Thomas

Proceedings of the 37th International Conference on Machine Learning, 2020

Evaluating the Performance of Reinforcement Learning Algorithms.

[BibT_eX]

[DOI]

Scott M. Jordan

,

,

,

,

Philip S. Thomas

Proceedings of the 37th International Conference on Machine Learning, 2020

Optimizing for the Future in Non-Stationary MDPs.

[BibT_eX]

[DOI]

,

Georgios Theocharous

,

,

,

Sridhar Mahadevan

,

Philip S. Thomas

Proceedings of the 37th International Conference on Machine Learning, 2020

Is the Policy Gradient a Gradient?

[BibT_eX]

[DOI]

,

Philip S. Thomas

Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Lifelong Learning with a Changing Action Set.

[BibT_eX]

[DOI]

,

Georgios Theocharous

,

,

Philip S. Thomas

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Reinforcement Learning When All Actions Are Not Always Available.

[BibT_eX]

[DOI]

,

Georgios Theocharous

,

Blossom Metevier

,

Philip S. Thomas

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Reinforcement learning with spiking coagents.

[BibT_eX]

[DOI]

,

Abhishek Sharma

,

Sasikiran Yelamarthi

,

,

Philip S. Thomas

,

CoRR, 2019

Classical Policy Gradient: Preserving Bellman's Principle of Optimality.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Scott M. Jordan

,

,

,

James E. Kostas

CoRR, 2019

A New Confidence Interval for the Mean of a Bounded Random Variable.

[BibT_eX]

[DOI]

Erik G. Learned-Miller

,

Philip S. Thomas

CoRR, 2019

Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock.

[BibT_eX]

[DOI]

James E. Kostas

,

,

Philip S. Thomas

CoRR, 2019

Privacy Preserving Off-Policy Evaluation.

[BibT_eX]

[DOI]

,

Philip S. Thomas

,

CoRR, 2019

Offline Contextual Bandits with High Probability Fairness Guarantees.

[BibT_eX]

[DOI]

Blossom Metevier

,

Stephen Giguere

,

,

,

,

,

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning.

[BibT_eX]

[DOI]

Francisco M. Garcia

,

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Concentration Inequalities for Conditional Value at Risk.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Erik G. Learned-Miller

Proceedings of the 36th International Conference on Machine Learning, 2019

Learning Action Representations for Reinforcement Learning.

[BibT_eX]

[DOI]

,

Georgios Theocharous

,

James E. Kostas

,

Scott M. Jordan

,

Philip S. Thomas

Proceedings of the 36th International Conference on Machine Learning, 2019

A Compression-Inspired Framework for Macro Discovery.

[BibT_eX]

[DOI]

Francisco M. Garcia

,

Bruno C. da Silva

,

Philip S. Thomas

Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

Natural Option Critic.

[BibT_eX]

[DOI]

,

Philip S. Thomas

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Importance Sampling for Fair Policy Selection.

[BibT_eX]

[DOI]

,

Philip S. Thomas

,

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Decoupling Gradient-Like Learning Rules from Representations.

[BibT_eX]

[DOI]

Philip S. Thomas

,

,

Proceedings of the 35th International Conference on Machine Learning, 2018

2017

On Ensuring that Intelligent Machines Are Well-Behaved.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Bruno Castro da Silva

,

Andrew G. Barto

,

CoRR, 2017

Decoupling Learning Rules from Representations.

[BibT_eX]

[DOI]

Philip S. Thomas

,

,

CoRR, 2017

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines.

[BibT_eX]

[DOI]

Philip S. Thomas

,

CoRR, 2017

Using Options for Long-Horizon Off-Policy Evaluation.

[BibT_eX]

[DOI]

Zhaohan Daniel Guo

,

Philip S. Thomas

,

CoRR, 2017

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation.

[BibT_eX]

[DOI]

,

Philip S. Thomas

,

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Data-Efficient Policy Evaluation Through Behavior Policy Search.

[BibT_eX]

[DOI]

Josiah P. Hanna

,

Philip S. Thomas

,

,

Proceedings of the 34th International Conference on Machine Learning, 2017

Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Georgios Theocharous

,

Mohammad Ghavamzadeh

,

,

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

Importance Sampling with Unequal Support.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement.

[BibT_eX]

[DOI]

Kathleen M. Jagodnik

,

Philip S. Thomas

,

Antonie J. van den Bogert

,

Michael S. Branicky

,

Robert F. Kirsch

IEEE Trans. Hum. Mach. Syst., 2016

Energetic Natural Gradient Descent.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Bruno Castro da Silva

,

,

Proceedings of the 33nd International Conference on Machine Learning, 2016

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Proceedings of the 33nd International Conference on Machine Learning, 2016

Increasing the Action Gap: New Operators for Reinforcement Learning.

[BibT_eX]

[DOI]

Marc G. Bellemare

,

Georg Ostrovski

,

,

Philip S. Thomas

,

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

A Notation for Markov Decision Processes.

[BibT_eX]

[DOI]

Philip S. Thomas

CoRR, 2015

Ad Recommendation Systems for Life-Time Value Optimization.

[BibT_eX]

[DOI]

Georgios Theocharous

,

Philip S. Thomas

,

Mohammad Ghavamzadeh

Proceedings of the 24th International Conference on World Wide Web Companion, 2015

Policy Evaluation Using the Ω-Return.

[BibT_eX]

[DOI]

Philip S. Thomas

,

,

Georgios Theocharous

,

George Dimitri Konidaris

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees.

[BibT_eX]

[DOI]

Georgios Theocharous

,

Philip S. Thomas

,

Mohammad Ghavamzadeh

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

High Confidence Policy Improvement.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Georgios Theocharous

,

Mohammad Ghavamzadeh

Proceedings of the 32nd International Conference on Machine Learning, 2015

High-Confidence Off-Policy Evaluation.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Georgios Theocharous

,

Mohammad Ghavamzadeh

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014

Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces.

[BibT_eX]

[DOI]

Sridhar Mahadevan

,

,

Philip S. Thomas

,

,

Stephen Giguere

,

,

,

CoRR, 2014

Natural Temporal Difference Learning.

[BibT_eX]

[DOI]

,

Philip S. Thomas

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013

Projected Natural Actor-Critic.

[BibT_eX]

[DOI]

Philip S. Thomas

,

,

Stephen Giguere

,

Sridhar Mahadevan

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

2012

Motor primitive discovery.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Andrew G. Barto

Proceedings of the 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics, 2012

2011

Policy Gradient Coagent Networks.

[BibT_eX]

[DOI]

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning.

[BibT_eX]

[DOI]

George Dimitri Konidaris

,

,

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Conjugate Markov Decision Processes.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Andrew G. Barto

Proceedings of the 28th International Conference on Machine Learning, 2011

Value Function Approximation in Reinforcement Learning Using the Fourier Basis.

[BibT_eX]

[DOI]

George Dimitri Konidaris

,

Sarah Osentoski

,

Philip S. Thomas

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2009

Application of the Actor-Critic Architecture to Functional Electrical Stimulation Control of a Human Arm.

[BibT_eX]

[DOI]

Philip S. Thomas

,

Antonie J. van den Bogert

,

Kathleen M. Jagodnik

,

Michael S. Branicky

Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence, 2009

Loading...