Philip S. Thomas

Affiliations:
  • University of Massachusetts Amherst, Department of Computer Science


According to our database1, Philip S. Thomas authored at least 72 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
From Past to Future: Rethinking Eligibility Traces.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Learning Fair Representations with High-Confidence Guarantees.
CoRR, 2023

Coagent Networks: Generalized and Scaled.
CoRR, 2023

Optimization using Parallel Gradient Evaluations on Multiple Parameters.
CoRR, 2023

Behavior Alignment via Reward Function Optimization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Seldonian Toolkit: Building Software with Safe and Fair Machine Learning.
Proceedings of the 45th IEEE/ACM International Conference on Software Engineering: ICSE 2023 Companion Proceedings, 2023

Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022
Enforcing Delayed-Impact Fairness Guarantees.
CoRR, 2022

Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL.
CoRR, 2022

Off-Policy Evaluation for Action-Dependent Non-stationary Environments.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Mechanizing Soundness of Off-Policy Evaluation.
Proceedings of the 13th International Conference on Interactive Theorem Proving, 2022

Fairness Guarantees under Demographic Shift.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Edge-Compatible Reinforcement Learning for Recommendations.
CoRR, 2021

Large-scale Interactive Conversational Recommendation System using Actor-Critic Framework.
Proceedings of the RecSys '21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 27 September 2021, 2021

SOPE: Spectrum of Off-Policy Estimators.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Structural Credit Assignment in Neural Networks using Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Universal Off-Policy Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Practical Mean Bounds for Small Samples.
Proceedings of the 38th International Conference on Machine Learning, 2021

Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods.
Proceedings of the 38th International Conference on Machine Learning, 2021

High Confidence Generalization for Reinforcement Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

High-Confidence Off-Policy (or Counterfactual) Variance Estimation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Reinforcement Learning for Strategic Recommendations.
CoRR, 2020

Learning Reusable Options for Multi-Task Reinforcement Learning.
CoRR, 2020

Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Towards Safe Policy Improvement for Non-Stationary MDPs.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Asynchronous Coagent Networks.
Proceedings of the 37th International Conference on Machine Learning, 2020

Evaluating the Performance of Reinforcement Learning Algorithms.
Proceedings of the 37th International Conference on Machine Learning, 2020

Optimizing for the Future in Non-Stationary MDPs.
Proceedings of the 37th International Conference on Machine Learning, 2020

Is the Policy Gradient a Gradient?
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Lifelong Learning with a Changing Action Set.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Reinforcement Learning When All Actions Are Not Always Available.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Reinforcement learning with spiking coagents.
CoRR, 2019

Classical Policy Gradient: Preserving Bellman's Principle of Optimality.
CoRR, 2019

A New Confidence Interval for the Mean of a Bounded Random Variable.
CoRR, 2019

Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock.
CoRR, 2019

Privacy Preserving Off-Policy Evaluation.
CoRR, 2019

Offline Contextual Bandits with High Probability Fairness Guarantees.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Concentration Inequalities for Conditional Value at Risk.
Proceedings of the 36th International Conference on Machine Learning, 2019

Learning Action Representations for Reinforcement Learning.
Proceedings of the 36th International Conference on Machine Learning, 2019

A Compression-Inspired Framework for Macro Discovery.
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

Natural Option Critic.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Decoupling Gradient-Like Learning Rules from Representations.
Proceedings of the 35th International Conference on Machine Learning, 2018

2017
On Ensuring that Intelligent Machines Are Well-Behaved.
CoRR, 2017

Decoupling Learning Rules from Representations.
CoRR, 2017

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines.
CoRR, 2017

Using Options for Long-Horizon Off-Policy Evaluation.
CoRR, 2017

Importance Sampling for Fair Policy Selection.
Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, 2017

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Data-Efficient Policy Evaluation Through Behavior Policy Search.
Proceedings of the 34th International Conference on Machine Learning, 2017

Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

Importance Sampling with Unequal Support.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement.
IEEE Trans. Hum. Mach. Syst., 2016

Energetic Natural Gradient Descent.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Increasing the Action Gap: New Operators for Reinforcement Learning.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
A Notation for Markov Decision Processes.
CoRR, 2015

Ad Recommendation Systems for Life-Time Value Optimization.
Proceedings of the 24th International Conference on World Wide Web Companion, 2015

Policy Evaluation Using the Ω-Return.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

High Confidence Policy Improvement.
Proceedings of the 32nd International Conference on Machine Learning, 2015

High-Confidence Off-Policy Evaluation.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces.
CoRR, 2014

Natural Temporal Difference Learning.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013
Projected Natural Actor-Critic.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

2012
Motor primitive discovery.
Proceedings of the 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics, 2012

2011
Policy Gradient Coagent Networks.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Conjugate Markov Decision Processes.
Proceedings of the 28th International Conference on Machine Learning, 2011

Value Function Approximation in Reinforcement Learning Using the Fourier Basis.
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2009
Application of the Actor-Critic Architecture to Functional Electrical Stimulation Control of a Human Arm.
Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence, 2009


  Loading...