# Shalabh Bhatnagar

According to our database

Collaborative distances:

^{1}, Shalabh Bhatnagar authored at least 172 papers between 1995 and 2020.Collaborative distances:

## Timeline

#### Legend:

Book In proceedings Article PhD thesis Other## Links

#### On csauthors.net:

## Bibliography

2020

Successive Over-Relaxation ${Q}$ -Learning.

IEEE Control Systems Letters, 2020

2019

Stability of Stochastic Approximations With "Controlled Markov" Noise and Temporal Difference Learning.

IEEE Trans. Automat. Contr., 2019

An Online Sample-Based Method for Mode Estimation Using ODE Analysis of Stochastic Approximation Algorithms.

IEEE Control Systems Letters, 2019

Generalized Speedy Q-learning.

CoRR, 2019

Solution of Two-Player Zero-Sum Game by Successive Relaxation.

CoRR, 2019

Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots.

CoRR, 2019

Reinforcement Learning in Non-Stationary Environments.

CoRR, 2019

Second Order Value Iteration in Reinforcement Learning.

CoRR, 2019

Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning.

CoRR, 2019

Successive Over Relaxation Q-Learning.

CoRR, 2019

An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms.

CoRR, 2019

Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch.

CoRR, 2019

Efficient Adaptive Resource Provisioning for Cloud Applications using Reinforcement Learning.

Proceedings of the IEEE 4th International Workshops on Foundations and Applications of Self* Systems, 2019

Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives.

Proceedings of the International Conference on Robotics and Automation, 2019

Efficient Budget Allocation and Task Assignment in Crowdsourcing.

Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 2019

Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning.

Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

2018

Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networks.

IEEE Wireless Commun. Letters, 2018

A stochastic approximation approach to active queue management.

Telecommunication Systems, 2018

Analysis of Gradient Descent Methods With Nondiminishing Bounded Errors.

IEEE Trans. Automat. Contr., 2018

A Linearly Relaxed Approximate Linear Program for Markov Decision Processes.

IEEE Trans. Automat. Contr., 2018

Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning.

Math. Oper. Res., 2018

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method.

Machine Learning, 2018

An incremental off-policy search in a model-free Markov decision process using a single sample path.

Machine Learning, 2018

Gradient-Based Adaptive Stochastic Search for Simulation Optimization Over Continuous Space.

INFORMS Journal on Computing, 2018

Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge.

CoRR, 2018

Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives.

CoRR, 2018

Random directions stochastic approximation with deterministic perturbations.

CoRR, 2018

An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method.

CoRR, 2018

A Cross Entropy based Optimization Algorithm with Global Convergence Guarantees.

CoRR, 2018

An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path.

CoRR, 2018

A unified decision making framework for supply and demand management in microgrid networks.

Proceedings of the 2018 IEEE International Conference on Communications, 2018

Generalized Deterministic Perturbations For Stochastic Gradient Search.

Proceedings of the 57th IEEE Conference on Decision and Control, 2018

2017

Adaptive mean queue size and its rate of change: queue management with random dropping.

Telecommunication Systems, 2017

Adaptive System Optimization Using Random Directions Stochastic Approximation.

IEEE Trans. Automat. Contr., 2017

A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions.

Math. Oper. Res., 2017

RLWS: A Reinforcement Learning based GPU Warp Scheduler.

CoRR, 2017

A unified decision making framework for supply and demand management in microgrid networks.

CoRR, 2017

Conditions for Stability and Convergence of Set-Valued Stochastic Approximations: Applications to Approximate Value and Fixed point Iterations with Noise.

CoRR, 2017

Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networks.

CoRR, 2017

Multi-Agent Q-Learning for Minimizing Demand-Supply Power Deficit in Microgrids.

CoRR, 2017

Analysis of stochastic approximation schemes with set-valued maps in the absence of a stability guarantee and their stabilization.

CoRR, 2017

A Linearly Relaxed Approximate Linear Program for Markov Decision Processes.

CoRR, 2017

Deterministic Perturbations For Simultaneous Perturbation Methods Using Circulant Matrices.

CoRR, 2017

Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization.

Comp. Opt. and Appl., 2017

A stability criterion for two timescale stochastic approximation schemes.

Automatica, 2017

An Incremental Fast Policy Search Using a Single Sample Path.

Proceedings of the Pattern Recognition and Machine Intelligence, 2017

Bounds for off-policy prediction in reinforcement learning.

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

A model based search method for prediction in model-free Markov decision process.

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach.

Proceedings of the 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), 2017

2016

Actor-Critic Algorithms with Online Feature Adaptation.

ACM Trans. Model. Comput. Simul., 2016

A constrained optimization perspective on actor-critic algorithms and application to network routing.

Systems & Control Letters, 2016

Multiscale Q-learning with linear function approximation.

Discrete Event Dynamic Systems, 2016

Stochastic Recursive Inclusions in two timescales with non-additive iterate dependent Markov noise.

CoRR, 2016

Stochastic Recursive Inclusions with Non-Additive Iterate-Dependent Markov Noise.

CoRR, 2016

Gradient-based learning algorithms with constant-error estimators: stability and convergence.

CoRR, 2016

Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach.

CoRR, 2016

Adaptive Mean Queue Size and Its Rate of Change: Queue Management with Random Dropping.

CoRR, 2016

On a convergent off -policy temporal difference learning algorithm in on-line learning environment.

CoRR, 2016

A note on the function approximation error bound for risk-sensitive reinforcement learning.

CoRR, 2016

A Cross Entropy based Stochastic Approximation Algorithm for Reinforcement Learning with Linear Function Approximation.

CoRR, 2016

A randomized algorithm for continuous optimization.

Proceedings of the Winter Simulation Conference, WSC 2016, 2016

Scalable focussed entity resolution.

Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Shaping Proto-Value Functions Using Rewards.

Proceedings of the ECAI 2016 - 22nd European Conference on Artificial Intelligence, 29 August-2 September 2016, The Hague, The Netherlands, 2016

Revisiting the Cross Entropy Method with Applications in Stochastic Global Optimization and Reinforcement Learning.

Proceedings of the ECAI 2016 - 22nd European Conference on Artificial Intelligence, 29 August-2 September 2016, The Hague, The Netherlands, 2016

Improved Hessian estimation for adaptive random directions stochastic approximation.

Proceedings of the 55th IEEE Conference on Decision and Control, 2016

2015

Energy Sharing for Multiple Sensor Nodes With Finite Buffers.

IEEE Trans. Communications, 2015

Simultaneous perturbation methods for adaptive labor staffing in service systems.

Simulation, 2015

Necessary and sufficient conditions for optimality in constrained general sum stochastic games.

Systems & Control Letters, 2015

Simultaneous Perturbation Newton Algorithms for Simulation Optimization.

J. Optimization Theory and Applications, 2015

A bi-convex optimization problem to compute Nash equilibrium in n-player games and an algorithm.

CoRR, 2015

Stability of Stochastic Approximations with 'Controlled Markov' Noise and Temporal Difference Learning.

CoRR, 2015

Stochastic recursive inclusions with two timescales.

CoRR, 2015

A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions.

CoRR, 2015

A Study of Gradient Descent Schemes for General-Sum Stochastic Games.

CoRR, 2015

Energy Sharing for Multiple Sensor Nodes with Finite Buffers.

CoRR, 2015

Shaping Proto-Value Functions via Rewards.

CoRR, 2015

Two Timescale Stochastic Approximation with Controlled Markov noise.

CoRR, 2015

A constrained optimization perspective on actor critic algorithms and application to network routing.

CoRR, 2015

Adaptive system optimization using (simultaneous) random directions stochastic approximation.

CoRR, 2015

A Stochastic Approximation Algorithm for Quantile Estimation.

Proceedings of the Neural Information Processing - 22nd International Conference, 2015

Decentralized learning for traffic signal control.

Proceedings of the 7th International Conference on Communication Systems and Networks, 2015

Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games.

Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 2015

A Generalized Reduced Linear Program for Markov Decision Processes.

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014

Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks.

Wireless Networks, 2014

Smoothed Functional Algorithms for Stochastic Optimization Using

*q*-Gaussian Distributions.
ACM Trans. Model. Comput. Simul., 2014

A simulation-based algorithm for optimal pricing policy under demand uncertainty.

ITOR, 2014

Algorithms for Nash Equilibria in General-Sum Stochastic Games.

CoRR, 2014

A Generalized Reduced Linear Program for Markov Decision Processes.

CoRR, 2014

Approximate dynamic programming with $(\min, +)$ linear function approximation for Markov decision processes.

CoRR, 2014

Approximate Dynamic Programming based on Projection onto the (min, +) subsemimodule.

CoRR, 2014

Newton-based stochastic optimization using q-Gaussian smoothed functional algorithms.

Automatica, 2014

Simulation optimization via gradient-based stochastic search.

Proceedings of the 2014 Winter Simulation Conference, 2014

Universal Option Models.

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Multi-agent reinforcement learning for traffic signal control.

Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems, 2014

A Markov Decision Process Framework for Predictable Job Completion Times on Crowdsourcing Platforms.

Proceedings of the Seconf AAAI Conference on Human Computation and Crowdsourcing, 2014

Adaptive sleep-wake control using reinforcement learning in sensor networks.

Proceedings of the Sixth International Conference on Communication Systems and Networks, 2014

Approximate Dynamic Programming with (min; +) linear function approximation for Markov decision processes.

Proceedings of the 53rd IEEE Conference on Decision and Control, 2014

An actor critic algorithm based on Grassmanian search.

Proceedings of the 53rd IEEE Conference on Decision and Control, 2014

2013

Q-Learning Based Energy Management Policies for a Single Sensor Node with Finite Buffer.

IEEE Wireless Commun. Letters, 2013

Feature Search in the Grassmanian in Online Reinforcement Learning.

J. Sel. Topics Signal Processing, 2013

Simultaneous Perturbation Methods for Adaptive Labor Staffing in Service Systems.

CoRR, 2013

Newton based Stochastic Optimization using q-Gaussian Smoothed Functional Algorithms.

CoRR, 2013

Reinforcement Learning for Sleep-Wake Scheduling in Sensor Networks.

CoRR, 2013

Mechanisms for hostile agents with capacity constraints.

Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2013

2012

Threshold Tuning Using Stochastic Optimization for Graded Signal Control.

IEEE Trans. Vehicular Technology, 2012

An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes.

J. Optimization Theory and Applications, 2012

Smoothed Functional Algorithms for Stochastic Optimization using q-Gaussian Distributions.

CoRR, 2012

q-Gaussian based Smoothed Functional Algorithm for Stochastic Optimization

CoRR, 2012

Optimal multi-layered congestion based pricing schemes for enhanced QoS.

Computer Networks, 2012

General-sum stochastic games: Verifiability conditions for Nash equilibria.

Automatica, 2012

q-Gaussian based Smoothed Functional algorithms for stochastic optimization.

Proceedings of the 2012 IEEE International Symposium on Information Theory, 2012

A novel Q-learning algorithm with function approximation for constrained Markov decision processes.

Proceedings of the 50th Annual Allerton Conference on Communication, 2012

2011

Stochastic approximation algorithms for constrained optimization via simulation.

ACM Trans. Model. Comput. Simul., 2011

Reinforcement Learning With Function Approximation for Traffic Signal Control.

IEEE Trans. Intelligent Transportation Systems, 2011

An Optimized SDE Model for Slotted Aloha.

IEEE Trans. Communications, 2011

Stochastic Algorithms for Discrete Parameter Simulation Optimization.

IEEE Trans. Automation Science and Engineering, 2011

The Borkar-Meyn theorem for asynchronous stochastic approximations.

Systems & Control Letters, 2011

Reinforcement learning with average cost for adaptive control of traffic lights at intersections.

Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems, 2011

Stochastic Optimization for Adaptive Labor Staffing in Service Systems.

Proceedings of the Service-Oriented Computing - 9th International Conference, 2011

Smoothed Functional and Quasi-Newton Algorithms for Routing in Multi-stage Queueing Network with Constraints.

Proceedings of the Distributed Computing and Internet Technology, 2011

2010

An efficient algorithm for scheduling in bluetooth piconets and scatternets.

Wireless Networks, 2010

Optimized Policies for the Retransmission Probabilities in Slotted Aloha.

Simulation, 2010

An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes.

Systems & Control Letters, 2010

Toward Off-Policy Learning Control with Function Approximation.

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

2009

Pattern Synthesis for Nonparametric Pattern Recognition.

Proceedings of the Encyclopedia of Data Warehousing and Mining, Second Edition (4 Volumes), 2009

Optimal parameter trajectory estimation in parameterized SDEs: An algorithmic procedure.

ACM Trans. Model. Comput. Simul., 2009

A probabilistic constrained nonlinear optimization framework to optimize RED parameters.

Perform. Eval., 2009

A proof of convergence of the B-RED and P-RED algorithms for random early detection.

IEEE Communications Letters, 2009

Natural actor-critic algorithms.

Automatica, 2009

Multi-Step Dyna Planning for Policy Evaluation and Control.

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Fast gradient-descent methods for temporal-difference learning with linear function approximation.

Proceedings of the 26th Annual International Conference on Machine Learning, 2009

LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS.

Proceedings of the 48th IEEE Conference on Decision and Control, 2009

2008

Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes.

Simulation, 2008

An efficient ad recommendation system for TV programs.

Multimedia Syst., 2008

New algorithms of the Q-learning type.

Automatica, 2008

Ant Colony Optimization Algorithms for Shortest Path Problems.

Proceedings of the Network Control and Optimization, Second Euro-NF Workshop, 2008

SPSA based feature relevance estimation for video retrieval.

Proceedings of the International Workshop on Multimedia Signal Processing, 2008

2007

Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization.

ACM Trans. Model. Comput. Simul., 2007

Gelfand-Yaglom-Perez theorem for generalized relative entropy functionals.

Inf. Sci., 2007

Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes.

Discrete Event Dynamic Systems, 2007

Incremental Natural Actor-Critic Algorithms.

Proceedings of the Advances in Neural Information Processing Systems 20, 2007

An Optimal Weighted-Average Congestion Based Pricing Scheme for Enhanced QoS.

Proceedings of the Distributed Computing and Internet Technology, 2007

An Efficient and Optimized Bluetooth Scheduling Algorithm for Piconets.

Proceedings of the Distributed Computing and Internet Technology, 2007

Fuzzy Clustering Based Ad Recommendation for TV Programs.

Proceedings of the Interactive TV: a Shared Experience, 5th European Conference, 2007

Link route pricing for enhanced QoS.

Proceedings of the 46th IEEE Conference on Decision and Control, 2007

Discrete parameter simulation optimization algorithms with applications to admission control with dependent service times.

Proceedings of the 46th IEEE Conference on Decision and Control, 2007

Network flow-control using asynchronous stochastic approximation.

Proceedings of the 46th IEEE Conference on Decision and Control, 2007

2006

Partition based pattern synthesis technique with efficient algorithms for nearest neighbor classification.

Pattern Recognition Letters, 2006

A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events.

J. Mach. Learn. Res., 2006

On Measure Theoretic definitions of Generalized Information Measures and Maximum Entropy Prescriptions

CoRR, 2006

Actor-critic algorithms for hierarchical Markov decision processes.

Automatica, 2006

SPSA algorithms with measurement reuse.

Proceedings of the Winter Simulation Conference WSC 2006, 2006

2005

Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization.

ACM Trans. Model. Comput. Simul., 2005

Optimal Threshold Policies for Admission Control in Communication Networks via Discrete Parameter Stochastic Approximation.

Telecommunication Systems, 2005

A Discrete Parameter Stochastic Approximation Algorithm for Simulation Optimization.

Simulation, 2005

Overlap pattern synthesis with an efficient nearest neighbor classifier.

Pattern Recognition, 2005

Uniqueness of Nonextensive entropy under Renyi's Recipe

CoRR, 2005

Properties of Kullback-Leibler cross-entropy minimization in nonextensive framework.

Proceedings of the 2005 IEEE International Symposium on Information Theory, 2005

Solution of Mdps Using Simulation-Based Value Iteration.

Proceedings of the Artificial Intelligence Applications and Innovations - IFIP TC12 WG12.5, 2005

Information theoretic justification of Boltzmann selection and its generalization to Tsallis case.

Proceedings of the IEEE Congress on Evolutionary Computation, 2005

2004

A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes.

IEEE Trans. Automat. Contr., 2004

Fusion of multiple approximate nearest neighbor classifiers for fast and efficient classification.

Information Fusion, 2004

Cauchy Annealing Schedule: An Annealing Schedule for Boltzmann Selection Scheme in Evolutionary Algorithms

CoRR, 2004

Generalized Evolutionary Algorithm based on Tsallis Statistics

CoRR, 2004

A Pattern Synthesis Technique with an Efficient Nearest Neighbor Classifier for Binary Pattern Recognition.

Proceedings of the 17th International Conference on Pattern Recognition, 2004

Cauchy annealing schedule: an annealing schedule for Boltzmann selection scheme in evolutionary algorithms.

Proceedings of the IEEE Congress on Evolutionary Computation, 2004

2003

Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences.

ACM Trans. Model. Comput. Simul., 2003

Multiscale Chaotic SPSA and Smoothed Functional Algorithms for Simulation Optimization.

Simulation, 2003

2002

A time aggregation approach to Markov decision processes.

Automatica, 2002

2001

Optimal structured feedback policies for ABR flow control using two-timescale SPSA.

IEEE/ACM Trans. Netw., 2001

1995

A Convex Analytic Framework for Ergodic Control of Semi-Markov Processes.

Math. Oper. Res., 1995