Alberto Maria Metelli

Orcid: 0000-0002-3424-5212

According to our database1, Alberto Maria Metelli authored at least 103 papers between 2017 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Search or split: policy gradient with adaptive policy space.
Mach. Learn., August, 2025

Generalized Kernelized Bandits: Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds.
CoRR, August, 2025

Gym4ReaL: A Suite for Benchmarking Real-World Reinforcement Learning.
CoRR, July, 2025

Reusing Trajectories in Policy Gradients Enables Fast Convergence.
CoRR, June, 2025

Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes.
CoRR, June, 2025

Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits.
CoRR, May, 2025

Thompson Sampling-like Algorithms for Stochastic Rising Bandits.
CoRR, May, 2025

A Refined Analysis of UCBVI.
CoRR, February, 2025

Achieving ~O(√T) Regret in Average-Reward POMDPs with Known Observation Models.
CoRR, January, 2025

Reward Compatibility: A Framework for Inverse RL.
CoRR, January, 2025

On the Partial Identifiability in Reward Learning: Choosing the Best Reward.
CoRR, January, 2025

Generalizing the Regret: an Analysis of Lower and Upper Bounds.
J. Artif. Intell. Res., 2025

AReS: A patient simulator to facilitate testing of automated anesthesia.
Comput. Methods Programs Biomed., 2025

Factored-reward bandits with intermediate observations: Regret minimization and best arm identification.
Artif. Intell., 2025

Open Problem: Regret Minimization in Heavy-Tailed Bandits with Unknown Distributional Parameters.
Proceedings of the Thirty Eighth Annual Conference on Learning Theory, 2025

Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

Efficient Exploitation of Hierarchical Structure in Sparse Reward Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

2024
Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds.
Mach. Learn., September, 2024

Interpretable linear dimensionality reduction based on bias-variance analysis.
Data Min. Knowl. Discov., July, 2024

Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs.
IEEE Trans. Intell. Transp. Syst., May, 2024

Switching Latent Bandits.
Trans. Mach. Learn. Res., 2024

Rising Rested Bandits: Lower Bounds and Efficient Algorithms.
CoRR, 2024

Statistical Analysis of Policy Space Compression Problem.
CoRR, 2024

Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting.
CoRR, 2024

Learning Utilities from Demonstrations in Markov Decision Processes.
CoRR, 2024

Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting.
CoRR, 2024

Sliding-Window Thompson Sampling for Non-Stationary Settings.
CoRR, 2024

State and Action Factorization in Power Grids.
CoRR, 2024

Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards.
CoRR, 2024

How to Scale Inverse RL to Large State Spaces? A Provably Efficient Approach.
CoRR, 2024

Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes.
CoRR, 2024

Information Capacity Regret Bounds for Bandits with Mediator Feedback.
CoRR, 2024

Inverse Reinforcement Learning with Sub-optimal Experts.
CoRR, 2024

Policy Gradient with Active Importance Sampling.
RLJ, 2024

A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning.
RLJ, 2024

Interpetable Target-Feature Aggregation for Multi-task Learning Based on Bias-Variance Analysis.
Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2024

Optimal Multi-Fidelity Best-Arm Identification.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Sub-optimal Experts mitigate Ambiguity in Inverse Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

The Power of Hybrid Learning in Industrial Robotics: Efficient Grasping Strategies with Supervised-Driven Reinforcement Learning.
Proceedings of the International Joint Conference on Neural Networks, 2024

Causal Feature Selection via Transfer Entropy.
Proceedings of the International Joint Conference on Neural Networks, 2024

Online Learning with Off-Policy Feedback in Adversarial MDPs.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Best Arm Identification for Stochastic Rising Bandits.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Factored-Reward Bandits with Intermediate Observations.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Learning Optimal Deterministic Policies with Stochastic Policy Gradients.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

No-Regret Reinforcement Learning in Smooth MDPs.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Graph-Triggered Rising Bandits.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs.
Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

(ε, u)-Adaptive Regret Minimization in Heavy-Tailed Bandits.
Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

Transfer Learning for Dynamical Systems Models via Autoencoders and GANs.
Proceedings of the American Control Conference, 2024

Dissimilarity Bandits.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

Autoregressive Bandits.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

Parameterized Projected Bellman Operator.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Recent Advancements in Inverse Reinforcement Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
IWDA: Importance Weighting for Drift Adaptation in Streaming Supervised Learning Problems.
IEEE Trans. Neural Networks Learn. Syst., October, 2023

ARLO: A framework for Automated Reinforcement Learning.
Expert Syst. Appl., August, 2023

An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-MDP.
Trans. Mach. Learn. Res., 2023

Towards Fully Adaptive Regret Minimization in Heavy-Tailed Bandits.
CoRR, 2023

Pure Exploration under Mediators' Feedback.
CoRR, 2023

Nonlinear Feature Aggregation: Two Algorithms driven by Theory.
CoRR, 2023

An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-Markov Decision Processes.
CoRR, 2023

On the Relation between Policy Improvement and Off-Policy Minimum-Variance Policy Evaluation.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice.
Proceedings of the IEEE Information Theory Workshop, 2023

Truncating Trajectories in Monte Carlo Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2023

Dynamical Linear Bandits.
Proceedings of the International Conference on Machine Learning, 2023

Towards Theoretical Understanding of Inverse Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2023

A Tale of Sampling and Estimation in Discounted Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

Simultaneously Updating All Persistence Values in Reinforcement Learning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Tight Performance Guarantees of Imitator Policies with Continuous Actions.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Exploiting environment configurability in reinforcement learning.
Frontiers in Artificial Intelligence and Applications 361, IOS Press, ISBN: 978-1-64368-363-8, 2022

Policy space identification in configurable environments.
Mach. Learn., 2022

A unified view of configurable Markov Decision Processes: Solution concepts, value functions, and operators.
Intelligenza Artificiale, 2022

Multi-Fidelity Best-Arm Identification.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Storehouse: a Reinforcement Learning Environment for Optimizing Warehouse Management.
Proceedings of the International Joint Conference on Neural Networks, 2022

Stochastic Rising Bandits.
Proceedings of the International Conference on Machine Learning, 2022

Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2022

Trust Region Meta Learning for Policy Optimization.
Proceedings of the ECML/PKDD Workshop on Meta-Knowledge Transfer, 2022

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems.
Mach. Learn., 2021

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach.
J. Mach. Learn. Res., 2021

Learning in Non-Cooperative Configurable Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Provably Efficient Learning of Transferable Rewards.
Proceedings of the 38th International Conference on Machine Learning, 2021

Policy Optimization as Online Learning with Mediator Feedback.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Combining reinforcement learning with rule-based controllers for transparent and general decision-making in autonomous driving.
Robotics Auton. Syst., 2020

Importance Sampling Techniques for Policy Optimization.
J. Mach. Learn. Res., 2020

On the use of the policy gradient and Hessian in inverse reinforcement learning.
Intelligenza Artificiale, 2020

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

Truly Batch Model-Free Inverse Reinforcement Learning about Multiple Intentions.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Gradient-Aware Model-Based Policy Search.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Feature Selection via Mutual Information: New Theoretical Insights.
Proceedings of the International Joint Conference on Neural Networks, 2019

Optimistic Policy Optimization via Multiple Importance Sampling.
Proceedings of the 36th International Conference on Machine Learning, 2019

Reinforcement Learning in Configurable Continuous Environments.
Proceedings of the 36th International Conference on Machine Learning, 2019

2018
Policy Optimization via Importance Sampling.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Configurable Markov Decision Processes.
Proceedings of the 35th International Conference on Machine Learning, 2018

2017
Compatible Reward Inverse Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017


  Loading...