We stand with Ukraine

We stand with Ukraine

Prashanth L. A.

Orcid: 0000-0003-0362-6730

Affiliations:

University of Maryland
INRIA Lille - Nord Europe
Indian Institute of Science, Department of Computer Science and Automation

According to our database¹, Prashanth L. A. authored at least 76 papers between 2008 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org
on isr.umd.edu

On csauthors.net:

Bibliography

2026

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs.

[DOI]

,

Prashanth L. A.

,

,

CoRR, May, 2026

Generalized Random Direction Newton Algorithms for Stochastic Optimization.

[DOI]

,

Prashanth L. A.

,

Shalabh Bhatnagar

,

CoRR, February, 2026

Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk.

[DOI]

,

Shrey Rakeshkumar Patel

,

,

Prashanth L. A.

,

CoRR, February, 2026

Policy Newton Methods for Distortion Riskmetrics.

[DOI]

,

Mizhaan Prajit Maniyar

,

Prashanth L. A.

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Generalized Simultaneous Perturbation-Based Gradient Search With Reduced Estimator Bias.

[DOI]

,

Shalabh Bhatnagar

,

Prashanth L. A.

IEEE Trans. Autom. Control., July, 2025

Learning to optimize convex risk measures: The cases of utility-based shortfall risk and optimized certainty equivalent risk.

[DOI]

,

Prashanth L. A.

,

CoRR, June, 2025

Preference-centric Bandits: Optimality of Mixtures and Regret-efficient Algorithms.

[DOI]

,

Arpan Mukherjee

,

Prashanth L. A.

,

Karthikeyan Shanmugam

,

CoRR, April, 2025

Online Estimation and Optimization of Utility-Based Shortfall Risk.

[DOI]

Vishwajit Hegde

,

Arvind S. Menon

,

Prashanth L. A.

,

Krishna P. Jagannathan

Math. Oper. Res., 2025

Gradient-Based Algorithms for Zeroth-Order Optimization.

[DOI]

Prashanth L. A.

,

Shalabh Bhatnagar

Found. Trends Optim., 2025

Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms.

[DOI]

,

Arpan Mukherjee

,

Prashanth L. A.

,

Karthikeyan Shanmugam

,

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

2024

Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP.

[DOI]

Tejaram Sangadi

,

Prashanth L. A.

,

Krishna P. Jagannathan

CoRR, 2024

Concentration Bounds for Optimized Certainty Equivalent Risk Estimation.

[DOI]

,

Prashanth L. A.

,

Krishna P. Jagannathan

CoRR, 2024

Truncated Cauchy random perturbations for smoothed functional-based stochastic optimization.

[DOI]

,

Prashanth L. A.

,

Shalabh Bhatnagar

Autom., 2024

Risk Estimation in a Markov Cost Process: Lower and Upper Bounds.

[DOI]

,

Prashanth L. A.

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Policy Evaluation for Variance in Average Reward Reinforcement Learning.

[DOI]

Shubhada Agrawal

,

Prashanth L. A.

,

Siva Theja Maguluri

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Optimization of Utility-based Shortfall Risk: A Non-asymptotic Viewpoint.

[DOI]

,

Prashanth L. A.

,

Proceedings of the 63rd IEEE Conference on Decision and Control, 2024

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning.

[DOI]

Mizhaan Prajit Maniyar

,

Prashanth L. A.

,

,

Shalabh Bhatnagar

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

Nonasymptotic Bounds for Stochastic Optimization With Biased Noisy Gradient Oracles.

[DOI]

,

Prashanth L. A.

IEEE Trans. Autom. Control., March, 2023

VaR\ and CVaR Estimation in a Markov Cost Process: Lower and Upper Bounds.

[DOI]

,

Prashanth L. A.

,

CoRR, 2023

A policy gradient approach for optimization of smooth risk measures.

[DOI]

,

Prashanth L. A.

Proceedings of the Uncertainty in Artificial Intelligence, 2023

Generalized Simultaneous Perturbation Stochastic Approximation with Reduced Estimator Bias.

[DOI]

Shalabh Bhatnagar

,

Prashanth L. A.

Proceedings of the 57th Annual Conference on Information Sciences and Systems, 2023

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation.

[DOI]

,

Prashanth L. A.

,

Dheeraj Nagaraj

,

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022

A Wasserstein Distance Approach for Concentration of Empirical Risk Estimates.

[DOI]

Prashanth L. A.

,

J. Mach. Learn. Res., 2022

Risk-Sensitive Reinforcement Learning via Policy Gradient Search.

[DOI]

Prashanth L. A.

,

Found. Trends Mach. Learn., 2022

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization.

[DOI]

,

Prashanth L. A.

,

Shalabh Bhatnagar

CoRR, 2022

Adaptive Estimation of Random Vectors with Bandit Feedback.

[DOI]

,

Prashanth L. A.

,

CoRR, 2022

Approximate gradient ascent methods for distortion risk measures.

[DOI]

,

Prashanth L. A.

CoRR, 2022

A Survey of Risk-Aware Multi-Armed Bandits.

[DOI]

Vincent Y. F. Tan

,

Prashanth L. A.

,

Krishna P. Jagannathan

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

2021

Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint.

[DOI]

,

Prashanth L. A.

Syst. Control. Lett., 2021

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling.

[DOI]

Prashanth L. A.

,

Nathaniel Korda

,

Mach. Learn., 2021

Online Estimation and Optimization of Utility-Based Shortfall Risk.

[DOI]

Arvind S. Menon

,

Prashanth L. A.

,

Krishna P. Jagannathan

CoRR, 2021

Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis.

[DOI]

,

Prashanth L. A.

CoRR, 2021

Smoothed functional-based gradient algorithms for off-policy reinforcement learning.

[DOI]

,

Prashanth L. A.

CoRR, 2021

Estimation of Spectral Risk Measures.

[DOI]

Ajay Kumar Pandey

,

Prashanth L. A.

,

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Random Directions Stochastic Approximation With Deterministic Perturbations.

[DOI]

Prashanth L. A.

,

Shalabh Bhatnagar

,

,

,

Steven I. Marcus

IEEE Trans. Autom. Control., 2020

Non-Asymptotic Bounds for Zeroth-Order Stochastic Optimization.

[DOI]

,

Prashanth L. A.

CoRR, 2020

Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions.

[DOI]

Prashanth L. A.

,

Krishna P. Jagannathan

,

Ravi Kumar Kolla

Proceedings of the 37th International Conference on Machine Learning, 2020

2019

Concentration bounds for empirical conditional value-at-risk: The unbounded case.

[DOI]

Ravi Kumar Kolla

,

Prashanth L. A.

,

,

Krishna P. Jagannathan

Oper. Res. Lett., 2019

Improved Concentration Bounds for Conditional Value-at-Risk and Cumulative Prospect Theory using Wasserstein distance.

[DOI]

,

Prashanth L. A.

CoRR, 2019

Risk-aware Multi-armed Bandits Using Conditional Value-at-Risk.

[DOI]

Ravi Kumar Kolla

,

Prashanth L. A.

,

Krishna P. Jagannathan

CoRR, 2019

Concentration of risk measures: A Wasserstein distance approach.

[DOI]

,

Prashanth L. A.

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Correlated bandits or: How to minimize mean-squared error online.

[DOI]

Vinay Praneeth Boda

,

Prashanth L. A.

Proceedings of the 36th International Conference on Machine Learning, 2019

2018

Stochastic Optimization in a Cumulative Prospect Theory Framework.

[DOI]

,

Prashanth L. A.

,

,

Steven I. Marcus

,

Csaba Szepesvári

IEEE Trans. Autom. Control., 2018

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint.

[DOI]

Prashanth L. A.

,

CoRR, 2018

2017

Adaptive System Optimization Using Random Directions Stochastic Approximation.

[DOI]

Prashanth L. A.

,

Shalabh Bhatnagar

,

,

Steven I. Marcus

IEEE Trans. Autom. Control., 2017

Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected.

[DOI]

,

Prashanth L. A.

,

,

Steven I. Marcus

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

A constrained optimization perspective on actor-critic algorithms and application to network routing.

[DOI]

Prashanth L. A.

,

,

Shalabh Bhatnagar

,

Prakash Chandra

Syst. Control. Lett., 2016

Variance-constrained actor-critic algorithms for discounted and average reward MDPs.

[DOI]

Prashanth L. A.

,

Mohammad Ghavamzadeh

Mach. Learn., 2016

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control.

[DOI]

Prashanth L. A.

,

,

,

Steven I. Marcus

,

Csaba Szepesvári

Proceedings of the 33nd International Conference on Machine Learning, 2016

Improved Hessian estimation for adaptive random directions stochastic approximation.

[DOI]

Sai Koti Reddy Danda

,

Prashanth L. A.

,

Shalabh Bhatnagar

Proceedings of the 55th IEEE Conference on Decision and Control, 2016

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles.

[DOI]

,

Prashanth L. A.

,

András György

,

Csaba Szepesvári

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015

Simultaneous perturbation methods for adaptive labor staffing in service systems.

[DOI]

Prashanth L. A.

,

,

,

Shalabh Bhatnagar

,

Simul., 2015

Simultaneous Perturbation Newton Algorithms for Simulation Optimization.

[DOI]

Shalabh Bhatnagar

,

Prashanth L. A.

J. Optim. Theory Appl., 2015

Cumulative Prospect Theory Meets Reinforcement Learning: Estimation and Control.

[DOI]

Prashanth L. A.

,

,

,

Steven I. Marcus

CoRR, 2015

Adaptive system optimization using (simultaneous) random directions stochastic approximation.

[DOI]

Prashanth L. A.

,

Shalabh Bhatnagar

CoRR, 2015

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence.

[DOI]

Nathaniel Korda

,

Prashanth L. A.

Proceedings of the 32nd International Conference on Machine Learning, 2015

Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games.

[DOI]

,

Prashanth L. A.

,

Shalabh Bhatnagar

Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 2015

Fast Gradient Descent for Drifting Least Squares Regression, with Application to Bandits.

[DOI]

Nathaniel Korda

,

Prashanth L. A.

,

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014

Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks.

[DOI]

Prashanth L. A.

,

Abhranil Chatterjee

,

Shalabh Bhatnagar

Wirel. Networks, 2014

Algorithms for Nash Equilibria in General-Sum Stochastic Games.

[DOI]

,

Prashanth L. A.

,

Shalabh Bhatnagar

CoRR, 2014

Actor-Critic Algorithms for Risk-Sensitive Reinforcement Learning.

[DOI]

Prashanth L. A.

,

Mohammad Ghavamzadeh

CoRR, 2014

Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control.

[DOI]

Prashanth L. A.

,

Nathaniel Korda

,

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2014

Adaptive sleep-wake control using reinforcement learning in sensor networks.

[DOI]

Prashanth L. A.

,

Abhranil Chatterjee

,

Shalabh Bhatnagar

Proceedings of the Sixth International Conference on Communication Systems and Networks, 2014

Simultaneous perturbation algorithms for batch off-policy search.

[DOI]

Raphael Fonteneau

,

Prashanth L. A.

Proceedings of the 53rd IEEE Conference on Decision and Control, 2014

Policy Gradients for CVaR-Constrained MDPs.

[DOI]

Prashanth L. A.

Proceedings of the Algorithmic Learning Theory - 25th International Conference, 2014

2013

Analysis of stochastic approximation for efficient least squares regression and LSTD.

[DOI]

Prashanth L. A.

,

Nathaniel Korda

,

CoRR, 2013

Online gradient descent for least squares regression: Non-asymptotic bounds and application to bandits.

[DOI]

Nathaniel Korda

,

Prashanth L. A.

,

CoRR, 2013

Reinforcement Learning for Sleep-Wake Scheduling in Sensor Networks.

[DOI]

Prashanth Lakshmanrao Ananthapadmanabharao

,

Abhranil Chatterjee

,

Shalabh Bhatnagar

CoRR, 2013

Actor-Critic Algorithms for Risk-Sensitive MDPs.

[DOI]

Prashanth L. A.

,

Mohammad Ghavamzadeh

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Mechanisms for hostile agents with capacity constraints.

[DOI]

Prashanth Lakshmanrao Ananthapadmanabharao

,

Horabailu Laxminarayana Prasad

,

,

Shalabh Bhatnagar

Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2013

2012

Threshold Tuning Using Stochastic Optimization for Graded Signal Control.

[DOI]

Prashanth L. A.

,

Shalabh Bhatnagar

IEEE Trans. Veh. Technol., 2012

2011

Reinforcement Learning With Function Approximation for Traffic Signal Control.

[DOI]

Prashanth L. A.

,

Shalabh Bhatnagar

IEEE Trans. Intell. Transp. Syst., 2011

Reinforcement learning with average cost for adaptive control of traffic lights at intersections.

[DOI]

Prashanth L. A.

,

Shalabh Bhatnagar

Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems, 2011

Stochastic Optimization for Adaptive Labor Staffing in Service Systems.

[DOI]

Prashanth L. A.

,

,

,

Shalabh Bhatnagar

,

Gargi Banerjee Dasgupta

Proceedings of the Service-Oriented Computing - 9th International Conference, 2011

2008

OFDM-MAC algorithms and their impact on TCP performance in next generation mobile networks.

[DOI]

Prashanth L. A.

,

Proceedings of the Third International Conference on COMmunication System softWAre and MiddlewaRE (COMSWARE 2008), 2008

MAC Design for Heterogeneous Application Support in OFDM Based Wireless Systems.

[DOI]

Prashanth L. A.

,

Sajal Kumar Das

,

Proceedings of the 5th IEEE Consumer Communications and Networking Conference, 2008

Loading...