Csaba Szepesvári

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Characterizing the Representer Theorem.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on Machine Learning, 2013

Cost-sensitive Multiclass Classification Risk Bounds.

[BibT_eX]

[DOI]

Bernardo Ávila Pires

Proceedings of the 30th International Conference on Machine Learning, 2013

Online Learning under Delayed Feedback.

[BibT_eX]

[DOI]

Pooria Joulani

Proceedings of the 30th International Conference on Machine Learning, 2013

A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on Machine Learning, 2013

2012

The adversarial stochastic shortest path problem with unknown transition probabilities.

[BibT_eX]

[DOI]

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits.

[BibT_eX]

[DOI]

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstractions

[BibT_eX]

[DOI]

CoRR, 2012

A Randomized Strategy for Learning to Combine Many Features

[BibT_eX]

[DOI]

CoRR, 2012

The grand challenge of computer Go: Monte Carlo tree search and extensions.

[BibT_eX]

[DOI]

Commun. ACM, 2012

Deep Representations and Codes for Image Auto-Annotation.

[BibT_eX]

[DOI]

Ryan Kiros

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Analysis of Kernel Mean Matching under Covariate Shift.

[BibT_eX]

[DOI]

Yaoliang Yu

Proceedings of the 29th International Conference on Machine Learning, 2012

Statistical linear estimation with penalized estimators: an application to reinforcement learning.

[BibT_eX]

[DOI]

Bernardo Ávila Pires

Proceedings of the 29th International Conference on Machine Learning, 2012

An adaptive algorithm for finite stochastic partial monitoring.

[BibT_eX]

[DOI]

Navid Zolghadr

Proceedings of the 29th International Conference on Machine Learning, 2012

Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments.

[BibT_eX]

[DOI]

Proceedings of the Tenth European Workshop on Reinforcement Learning, 2012

Preface.

[BibT_eX]

[DOI]

Marc Peter Deisenroth

Jan Peters

Proceedings of the Tenth European Workshop on Reinforcement Learning, 2012

Partial Monitoring with Side Information.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

Approximate Policy Iteration with Linear Action Models.

[BibT_eX]

[DOI]

Hengshuai Yao

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

2011

Model selection in reinforcement learning.

[BibT_eX]

[DOI]

Mach. Learn., 2011

Agnostic KWIK learning and efficient approximate reinforcement learning.

[BibT_eX]

[DOI]

István Szita

Proceedings of the COLT 2011, 2011

X-Armed Bandits.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2011

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments.

[BibT_eX]

[DOI]

Proceedings of the COLT 2011, 2011

Regret Bounds for the Adaptive Control of Linear Quadratic Systems.

[BibT_eX]

[DOI]

Proceedings of the COLT 2011, 2011

Non-trivial two-armed partial-monitoring games are bandits

[BibT_eX]

[DOI]

CoRR, 2011

Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems

[BibT_eX]

[DOI]

CoRR, 2011

PAC-Bayesian Policy Evaluation for Reinforcement Learning.

[BibT_eX]

[DOI]

Mahdi Milani Fard

Joelle Pineau

Proceedings of the UAI 2011, 2011

Improved Algorithms for Linear Stochastic Bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Sequential learning for optimal monitoring of multi-channel wireless networks.

[BibT_eX]

[DOI]

Pallavi Arora

Rong Zheng

Proceedings of the INFOCOM 2011. 30th IEEE International Conference on Computer Communications, 2011

Invited Talk: Towards Robust Reinforcement Learning Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Reinforcement Learning - 9th European Workshop, 2011

Editors' Introduction.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory - 22nd International Conference, 2011

2010

Algorithms for Reinforcement Learning

[BibT_eX]

[DOI]

Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers, ISBN: 978-3-031-01551-9, 2010

Active learning in heteroscedastic noise.

[BibT_eX]

[DOI]

Varun Grover

Theor. Comput. Sci., 2010

A Markov-Chain Monte Carlo Approach to Simultaneous Localization and Mapping.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

REGO: Rank-based Estimation of Renyi Information using Euclidean Graph Optimization.

[BibT_eX]

[DOI]

Barnabás Póczos

Sergey Kirshner

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Models of active learning in group-structured state spaces.

[BibT_eX]

[DOI]

Sandra Zilles

Inf. Comput., 2010

X-Armed Bandits

[BibT_eX]

[DOI]

CoRR, 2010

Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs.

[BibT_eX]

[DOI]

Barnabás Póczos

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Online Markov Decision Processes under Bandit Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Parametric Bandits: The Generalized Linear Case.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Error Propagation for Approximate Policy and Value Iteration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Extending rapidly-exploring random trees for asymptotically optimal anytime motion planning.

[BibT_eX]

[DOI]

Joseph Modayil

Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010

Model-based reinforcement learning with nearly tight exploration complexity bounds.

[BibT_eX]

[DOI]

Istvan Szita

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Toward Off-Policy Learning Control with Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Budgeted Distribution Learning of Belief Net Parameters.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

The Online Loop-free Stochastic Shortest-Path Problem.

[BibT_eX]

[DOI]

Proceedings of the COLT 2010, 2010

Toward a Classification of Finite Partial-Monitoring Games.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory, 21st International Conference, 2010

2009

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2009

Training parsers by inverse reinforcement learning.

[BibT_eX]

[DOI]

Mach. Learn., 2009

Learning Exercise Policies for American Options.

[BibT_eX]

[DOI]

Yuxi Li

Dale Schuurmans

Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, 2009

A General Projection Property for Distribution Families.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Multi-Step Dyna Planning for Policy Evaluation and Control.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Model-based and model-free reinforcement learning for visual servoing.

[BibT_eX]

[DOI]

Azad Shademan

Martin Jägersand

Proceedings of the 2009 IEEE International Conference on Robotics and Automation, 2009

Fast gradient-descent methods for temporal-difference learning with linear function approximation.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Learning when to stop thinking and do something!

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Learning to segment from a few well-selected training images.

[BibT_eX]

[DOI]

Alireza Farhangfar

Russell Greiner

Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Workshop summary: On-line learning with limited feedback.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Conference on Machine Learning, 2009

LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS.

[BibT_eX]

[DOI]

Hengshuai Yao

Shalabh Bhatnagar

Proceedings of the 48th IEEE Conference on Decision and Control, 2009

Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems.

[BibT_eX]

[DOI]

Shie Mannor

Proceedings of the American Control Conference, 2009

2008

Finite-Time Bounds for Fitted Value Iteration.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2008

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping.

[BibT_eX]

[DOI]

Proceedings of the UAI 2008, 2008

Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstraction.

[BibT_eX]

[DOI]

Proceedings of the UAI 2008, 2008

A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation.

[BibT_eX]

[DOI]

Richard S. Sutton

Hamid Reza Maei

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Regularized Policy Iteration.

[BibT_eX]

[DOI]

Shie Mannor

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Online Optimization in X-Armed Bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Empirical Bernstein stopping.

[BibT_eX]

[DOI]

Volodymyr Mnih

Proceedings of the Machine Learning, 2008

Regularized Fitted Q-Iteration: Application to Planning.

[BibT_eX]

[DOI]

Shie Mannor

Proceedings of the Recent Advances in Reinforcement Learning, 8th European Workshop, 2008

Active Learning of Group-Structured Environments.

[BibT_eX]

[DOI]

Sandra Zilles

Proceedings of the Algorithmic Learning Theory, 19th International Conference, 2008

Active Learning in Multi-armed Bandits.

[BibT_eX]

[DOI]

Varun Grover

Proceedings of the Algorithmic Learning Theory, 19th International Conference, 2008

2007

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods.

[BibT_eX]

[DOI]

Proceedings of the UAI 2007, 2007

Fitted Q-iteration in continuous action-space MDPs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Continuous Time Associative Bandit Problems.

[BibT_eX]

[DOI]

Proceedings of the IJCAI 2007, 2007

Sequence Prediction Exploiting Similary Information.

[BibT_eX]

[DOI]

István Bíró

Zoltán Szamonek

Proceedings of the IJCAI 2007, 2007

Manifold-adaptive dimension estimation.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2007

Improved Rates for the Stochastic Continuum-Armed Bandit Problem.

[BibT_eX]

[DOI]

Peter Auer

Ronald Ortner

Proceedings of the Learning Theory, 20th Annual Conference on Learning Theory, 2007

Tuning Bandit Algorithms in Stochastic Environments.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory, 18th International Conference, 2007

2006

Universal parameter optimisation in games based on SPSA.

[BibT_eX]

[DOI]

Levente Kocsis

Mach. Learn., 2006

Local Importance Sampling: A Novel Technique to Enhance Particle Filtering.

[BibT_eX]

[DOI]

J. Multim., 2006

Bandit Based Monte-Carlo Planning.

[BibT_eX]

[DOI]

Levente Kocsis

Proceedings of the Machine Learning: ECML 2006, 2006

Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path.

[BibT_eX]

[DOI]

Proceedings of the Learning Theory, 19th Annual Conference on Learning Theory, 2006

RSPSA: Enhanced Parameter Optimization in Games.

[BibT_eX]

[DOI]

Levente Kocsis

Mark H. M. Winands

Proceedings of the Advances in Computer Games, 11th International Conference, 2006

2005

Finite time bounds for sampling based fitted value iteration.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2005

X-mHMM: An Efficient Algorithm for Training Mixtures of HMMs When the Number of Mixtures Is Unknown.

[BibT_eX]

[DOI]

Zoltán Szamonek

Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 2005

Log-optimal currency portfolios and control Lyapunov exponents.

[BibT_eX]

[DOI]

Proceedings of the 44th IEEE IEEE Conference on Decision and Control and 8th European Control Conference Control, 2005

2004

Interpolation-based Q-learning.

[BibT_eX]

[DOI]

William D. Smart

Proceedings of the Machine Learning, 2004

Margin Maximizing Discriminant Analysis.

[BibT_eX]

[DOI]

András Kocsor

Kornél Kovács

Proceedings of the Machine Learning: ECML 2004, 2004

Enhancing Particle Filters Using Local Likelihood Sampling.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2004

Kernel Machine Based Feature Extraction Algorithms for Regression Problems.

[BibT_eX]

András Kocsor

Kornél Kovács

Proceedings of the 16th Eureopean Conference on Artificial Intelligence, 2004

Shortest Path Discovery Problems: A Framework, Algorithms and Experimental Results.

[BibT_eX]

[DOI]

Proceedings of the Nineteenth National Conference on Artificial Intelligence, 2004

2003

Sequential Importance Sampling for Visual Tracking Reconsidered.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003

2002

An Asymptotic Scaling Analysis of LQ Performance for an Approximate Adaptive Control Design.

[BibT_eX]

[DOI]

Mark French

Eric Rogers

Math. Control. Signals Syst., 2002

LQ performance bounds for adaptive output feedback controllers for functionally uncertain nonlinear systems.

[BibT_eX]

[DOI]

Mark French

Eric Rogers

Autom., 2002

2001

Ockham's Razor Modeling of the Matrisome Channels of the Basal Ganglia Thalamocortical Loops.

[BibT_eX]

[DOI]

György Hévízi

Int. J. Neural Syst., 2001

Efficient approximate planning in continuous space Markovian Decision Problems.

[BibT_eX]

[DOI]

AI Commun., 2001

2000

Uncertainty, performance, and model dependency in approximate adaptive nonlinear control.

[BibT_eX]

[DOI]

Mark French

Eric Rogers

IEEE Trans. Autom. Control., 2000

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms.

[BibT_eX]

[DOI]

Mach. Learn., 2000

Modular Reinforcement Learning: A Case Study in a Robot Domain.

[BibT_eX]

[DOI]

Acta Cybern., 2000

FlexVoice: A Parametric Approach to High-Quality Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - Third International Workshop, 2000

1999

Parallel and robust skeletonization built on self-organizing elements.

[BibT_eX]

[DOI]

Neural Networks, 1999

A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms.

[BibT_eX]

[DOI]

Michael L. Littman

Neural Comput., 1999

The SBASE protein domain library, release 6.0: a collection of annotated protein sequence segments.

[BibT_eX]

[DOI]

Nucleic Acids Res., 1999

1998

An integrated architecture for motion-control and path-planning.

[BibT_eX]

[DOI]

J. Field Robotics, 1998

Module-Based Reinforcement Learning: Experiments with a Real Robot.

[BibT_eX]

[DOI]

Auton. Robots, 1998

Non-Markovian Policies in Sequential Decision Problems.

[BibT_eX]

[DOI]

Acta Cybern., 1998

Performance-Evaluation for Automated Detection of Microcalcifications in Mammograms Using Three Different Film-Digitizers.

[BibT_eX]

[DOI]

Proceedings of the Digital Mammography, 1998

Automated Detection and Classification of Micro-Calcifications in Mammograms Using Artifical Neural Nets.

[BibT_eX]

[DOI]

Proceedings of the Digital Mammography, 1998

Multi-criteria Reinforcement Learning.

[BibT_eX]

Zoltán Gábor

Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), 1998

1997

Neurocontroller using dynamic state feedback for compensatory control.

[BibT_eX]

[DOI]

Szabolcs Cimmer

Neural Networks, 1997

The Asymptotic Convergence-Rate of Q-learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 10, 1997

Module Based Reinforcement Learning: An Application to a Real Robot.

[BibT_eX]

[DOI]

Proceedings of the Learning Robots, 6th European Workshop, 1997

Learning and Exploitation Do Not Conflict Under Minimax Optimality.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning: ECML-97, 1997

1996

Approximate geometry representations and sensory fusion.

[BibT_eX]

[DOI]

Neurocomputing, 1996

Self-Organizing Multi-Resolution Grid for Motion Planning and Control.

[BibT_eX]

[DOI]

Int. J. Neural Syst., 1996

A Generalized Reinforcement-Learning Model: Convergence and Applications.

[BibT_eX]

Michael L. Littman

Proceedings of the Machine Learning, 1996

Inverse Dynamics Controllers for Robust Control: Consequences for Neurocontrollers.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks, 1996

1994

Topology Learning Solved by Extended Objects: A Neural Network Model.

[BibT_eX]

[DOI]

László Balázs

Neural Comput., 1994

1993

Behavior of an Adaptive Self-organizing Autonomous Agent Working with Cues and Competing Concepts.

[BibT_eX]

[DOI]