# Shai Shalev-Shwartz

According to our database

Collaborative distances:

^{1}, Shai Shalev-Shwartz authored at least 152 papers between 2002 and 2018.Collaborative distances:

## Timeline

#### Legend:

Book In proceedings Article PhD thesis Other## Links

#### Homepage:

#### On csauthors.net:

## Bibliography

2018

A Provably Correct Algorithm for Deep Learning that Actually Works.

CoRR, 2018

2017

Near-Optimal Algorithms for Online Matrix Prediction.

SIAM J. Comput., 2017

Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization.

Journal of Machine Learning Research, 2017

SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data.

CoRR, 2017

On a Formal Model of Safe and Scalable Self-driving Cars.

CoRR, 2017

Weight Sharing is Crucial to Succesful Optimization.

CoRR, 2017

Failures of Deep Learning.

CoRR, 2017

Decoupling "when to update" from "how to update".

CoRR, 2017

Fast Rates for Empirical Risk Minimization of Strict Saddle Problems.

CoRR, 2017

Decoupling "when to update" from "how to update".

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Failures of Gradient-Based Deep Learning.

Proceedings of the 34th International Conference on Machine Learning, 2017

Fast Rates for Empirical Risk Minimization of Strict Saddle Problems.

Proceedings of the 30th Conference on Learning Theory, 2017

Effective Semisupervised Learning on Manifolds.

Proceedings of the 30th Conference on Learning Theory, 2017

2016

Perceptron Algorithm.

Encyclopedia of Algorithms, 2016

Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization.

Math. Program., 2016

Subspace Learning with Partial Information.

Journal of Machine Learning Research, 2016

On Lower and Upper Bounds in Smooth and Strongly Convex Optimization.

Journal of Machine Learning Research, 2016

Learning a Metric Embedding for Face Recognition using the Multibatch Method.

CoRR, 2016

Minimizing the Maximal Loss: How and Why?

CoRR, 2016

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving.

CoRR, 2016

On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training.

CoRR, 2016

Long-term Planning by Short-term Prediction.

CoRR, 2016

SDCA without Duality, Regularization, and Individual Convexity.

CoRR, 2016

Faster Low-rank Approximation using Adaptive Gap-based Preconditioning.

CoRR, 2016

Tightening the Sample Complexity of Empirical Risk Minimization via Preconditioned Stability.

CoRR, 2016

Solving Ridge Regression using Sketched Preconditioned SVRG.

CoRR, 2016

Distribution Free Learning with Local Queries.

CoRR, 2016

Learning a Metric Embedding for Face Recognition using the Multibatch Method.

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Minimizing the Maximal Loss: How and Why.

Proceedings of the 33nd International Conference on Machine Learning, 2016

SDCA without Duality, Regularization, and Individual Convexity.

Proceedings of the 33nd International Conference on Machine Learning, 2016

On Graduated Optimization for Stochastic Non-Convex Problems.

Proceedings of the 33nd International Conference on Machine Learning, 2016

Solving Ridge Regression using Sketched Preconditioned SVRG.

Proceedings of the 33nd International Conference on Machine Learning, 2016

Complexity Theoretic Limitations on Learning DNF's.

Proceedings of the 29th Conference on Learning Theory, 2016

2015

Learning sparse low-threshold linear classifiers.

Journal of Machine Learning Research, 2015

Multiclass learnability and the ERM principle.

Journal of Machine Learning Research, 2015

SDCA without Duality.

CoRR, 2015

Beyond Convexity: Stochastic Quasi-Convex Optimization.

CoRR, 2015

On Graduated Optimization for Stochastic Non-Convex Problems.

CoRR, 2015

Faster SGD Using Sketched Conditioning.

CoRR, 2015

Strongly Adaptive Online Learning.

CoRR, 2015

On Lower and Upper Bounds for Smooth and Strongly Convex Optimization Problems.

CoRR, 2015

Beyond Convexity: Stochastic Quasi-Convex Optimization.

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Strongly Adaptive Online Learning.

Proceedings of the 32nd International Conference on Machine Learning, 2015

2014

Matrix completion with the trace norm: learning, bounding, and transducing.

Journal of Machine Learning Research, 2014

SelfieBoost: A Boosting Algorithm for Deep Learning.

CoRR, 2014

On the Computational Efficiency of Training Neural Networks.

CoRR, 2014

The Sample Complexity of Subspace Learning with Partial Information.

CoRR, 2014

Optimal Learners for Multiclass Problems.

CoRR, 2014

Complexity theoretic limitations on learning DNF's.

CoRR, 2014

From average case complexity to improper learning complexity.

Proceedings of the Symposium on Theory of Computing, 2014

On the Computational Efficiency of Training Neural Networks.

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

K-means recovers ICA filters when independent components are sparse.

Proceedings of the 31th International Conference on Machine Learning, 2014

Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization.

Proceedings of the 31th International Conference on Machine Learning, 2014

Optimal learners for multiclass problems.

Proceedings of The 27th Conference on Learning Theory, 2014

The Complexity of Learning Halfspaces using Generalized Linear Methods.

Proceedings of The 27th Conference on Learning Theory, 2014

2013

Stochastic dual coordinate ascent methods for regularized loss.

Journal of Machine Learning Research, 2013

Efficient active learning of halfspaces: an aggressive approach.

Journal of Machine Learning Research, 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

CoRR, 2013

A Provably Efficient Algorithm for Training Deep Networks

CoRR, 2013

Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization.

CoRR, 2013

Multiclass learnability and the ERM principle.

CoRR, 2013

From average case complexity to improper learning complexity.

CoRR, 2013

More data speeds up training time in learning halfspaces over sparse vectors.

CoRR, 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent.

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

More data speeds up training time in learning halfspaces over sparse vectors.

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Vanishing Component Analysis.

Proceedings of the 30th International Conference on Machine Learning, 2013

Efficient Active Learning of Halfspaces: an Aggressive Approach.

Proceedings of the 30th International Conference on Machine Learning, 2013

Learning Optimally Sparse Support Vector Machines.

Proceedings of the 30th International Conference on Machine Learning, 2013

2012

Using More Data to Speed-up Training Time.

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Regularization Techniques for Learning with Matrices.

Journal of Machine Learning Research, 2012

Near-Optimal Algorithms for Online Matrix Prediction.

Proceedings of the COLT 2012, 2012

Online Learning and Online Convex Optimization.

Foundations and Trends in Machine Learning, 2012

Learning Sparse Low-Threshold Linear Classifiers

CoRR, 2012

Proximal Stochastic Dual Coordinate Ascent

CoRR, 2012

The error rate of learning halfspaces using Kernel-SVMs

CoRR, 2012

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

CoRR, 2012

Efficient Pool-Based Active Learning of Halfspaces

CoRR, 2012

Learning the Experts for Online Sequence Prediction

CoRR, 2012

Multiclass Learning Approaches: A Theoretical Comparison with Implications

CoRR, 2012

The Kernelized Stochastic Batch Perceptron

CoRR, 2012

Near-Optimal Algorithms for Online Matrix Prediction

CoRR, 2012

Multiclass Learning Approaches: A Theoretical Comparison with Implications.

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs.

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Domain Adaptation--Can Quantity compensate for Quality?.

Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2012

Learning the Experts for Online Sequence Prediction.

Proceedings of the 29th International Conference on Machine Learning, 2012

The Kernelized Stochastic Batch Perceptron.

Proceedings of the 29th International Conference on Machine Learning, 2012

Learnability beyond Uniform Convergence.

Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

2011

Online Learning of Noisy Data.

IEEE Trans. Information Theory, 2011

Learning Kernel-Based Halfspaces with the 0-1 Loss.

SIAM J. Comput., 2011

Pegasos: primal estimated sub-gradient solver for SVM.

Math. Program., 2011

Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing.

Proceedings of the COLT 2011, 2011

Journal of Machine Learning Research, 2011

Multiclass Learnability and the ERM principle.

Proceedings of the COLT 2011, 2011

Efficient Learning with Partially Observed Attributes.

Journal of Machine Learning Research, 2011

Active Learning Halfspaces under Margin Assumptions

CoRR, 2011

ShareBoost: Efficient Multiclass Learning with Feature Sharing

CoRR, 2011

Large-Scale Convex Minimization with a Low-Rank Constraint

CoRR, 2011

Using More Data to Speed-up Training Time

CoRR, 2011

ShareBoost: Efficient multiclass learning with feature sharing.

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Learning Linear and Kernel Predictors with the 0-1 Loss Function.

Proceedings of the IJCAI 2011, 2011

Access to Unlabeled Data can Speed up Prediction Time.

Proceedings of the 28th International Conference on Machine Learning, 2011

Large-Scale Convex Minimization with a Low-Rank Constraint.

Proceedings of the 28th International Conference on Machine Learning, 2011

Quantity Makes Quality: Learning with Partial Views.

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2010

Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints.

SIAM Journal on Optimization, 2010

On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms.

Machine Learning, 2010

Learnability, Stability and Uniform Convergence.

Journal of Machine Learning Research, 2010

Learning Kernel-Based Halfspaces with the Zero-One Loss

CoRR, 2010

Online Learning of Noisy Data with Kernels

CoRR, 2010

Efficient Learning with Partially Observed Attributes

CoRR, 2010

Efficient Learning with Partially Observed Attributes.

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Learning Kernel-Based Halfspaces with the Zero-One Loss.

Proceedings of the COLT 2010, 2010

Composite Objective Mirror Descent.

Proceedings of the COLT 2010, 2010

Online Learning of Noisy Data with Kernels.

Proceedings of the COLT 2010, 2010

2009

Individual sequence prediction using memory-efficient context trees.

IEEE Trans. Information Theory, 2009

Applications of strong convexity--strong smoothness duality to learning with matrices

CoRR, 2009

Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Learnability and Stability in the General Learning Setting.

Proceedings of the COLT 2009, 2009

Stochastic Convex Optimization.

Proceedings of the COLT 2009, 2009

The Complexity of Improperly Learning Large Margin Halfspaces.

Proceedings of the COLT 2009, 2009

Agnostic Online Learning.

Proceedings of the COLT 2009, 2009

2008

Perceptron Algorithm.

Proceedings of the Encyclopedia of Algorithms, 2008

The Forgetron: A Kernel-Based Perceptron on a Budget.

SIAM J. Comput., 2008

Ranking Categorical Features Using Generalization Properties.

Journal of Machine Learning Research, 2008

Online Learning of Complex Prediction Problems Using Simultaneous Projections.

Journal of Machine Learning Research, 2008

Fast Rates for Regularized Objectives.

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Mind the Duality Gap: Logarithmic regret algorithms for online optimization.

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

SVM optimization: inverse dependence on training set size.

Proceedings of the Machine Learning, 2008

Efficient bandit algorithms for online multiclass prediction.

Proceedings of the Machine Learning, 2008

Proceedings of the Machine Learning, 2008

On the Equivalence of Weak Learnability and Linear Separability: New Relaxations and Efficient Boosting Algorithms.

Proceedings of the 21st Annual Conference on Learning Theory, 2008

2007

A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment.

IEEE Trans. Audio, Speech & Language Processing, 2007

A primal-dual perspective of online learning algorithms.

Machine Learning, 2007

A Unified Algorithmic Approach for Efficient Online Label Ranking.

Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, 2007

Pegasos: Primal Estimated sub-GrAdient SOlver for SVM.

Proceedings of the Machine Learning, 2007

Prediction by Categorical Features: Generalization Properties and Application to Feature Ranking.

Proceedings of the Learning Theory, 20th Annual Conference on Learning Theory, 2007

2006

Efficient Learning of Label Ranking by Soft Projections onto Polyhedra.

Journal of Machine Learning Research, 2006

Online Passive-Aggressive Algorithms.

Journal of Machine Learning Research, 2006

Convex Repeated Games and Fenchel Duality.

Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Online Classification for Complex Problems Using Simultaneous Projections.

Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Discriminative kernel-based phoneme sequence recognition.

Proceedings of the INTERSPEECH 2006, 2006

Online multiclass learning by interclass hypothesis sharing.

Proceedings of the Machine Learning, 2006

Online Learning Meets Optimization in the Dual.

Proceedings of the Learning Theory, 19th Annual Conference on Learning Theory, 2006

2005

Smooth epsiloon-Insensitive Regression by Loss Symmetrization.

Journal of Machine Learning Research, 2005

The Forgetron: A Kernel-Based Perceptron on a Fixed Budget.

Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Phoneme alignment based on discriminative learning.

Proceedings of the INTERSPEECH 2005, 2005

A New Perspective on an Old Perceptron Algorithm.

Proceedings of the Learning Theory, 18th Annual Conference on Learning Theory, 2005

2004

The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees.

Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Learning to Align Polyphonic Music.

Proceedings of the ISMIR 2004, 2004

Online and batch learning of pseudo-metrics.

Proceedings of the Machine Learning, 2004

2003

Online Passive-Aggressive Algorithms.

Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Smooth e-Intensive Regression by Loss Symmetrization.

Proceedings of the Computational Learning Theory and Kernel Machines, 2003

2002

Robust temporal and spectral modeling for query By melody.

Proceedings of the SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002