Shai Shalev-Shwartz

Affiliations:
  • The Hebrew University of Jerusalem, School of Computer Science and Engineering, Israel


According to our database1, Shai Shalev-Shwartz authored at least 140 papers between 2002 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of two.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Jamba: A Hybrid Transformer-Mamba Language Model.
CoRR, 2024

2023
Managing AI Risks in an Era of Rapid Progress.
CoRR, 2023

SubTuning: Efficient Finetuning for Multi-Task Learning.
CoRR, 2023

2022
When Hardness of Approximation Meets Hardness of Learning.
J. Mach. Learn. Res., 2022

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning.
CoRR, 2022

Standing on the Shoulders of Giant Frozen Language Models.
CoRR, 2022

Knowledge Distillation: Bad Models Can Be Good Role Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Learning of CNNs using Patch Based Features.
Proceedings of the International Conference on Machine Learning, 2022

2021
Computational Separation Between Convolutional and Fully-Connected Networks.
Proceedings of the 9th International Conference on Learning Representations, 2021

The Connection Between Approximation, Depth Separation and Learnability in Neural Networks.
Proceedings of the Conference on Learning Theory, 2021

2020
On the Ethics of Building AI in a Responsible Manner.
CoRR, 2020

The Implications of Local Correlation on Learning Some Deep Functions.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need.
Proceedings of the 37th International Conference on Machine Learning, 2020

The Implicit Bias of Depth: How Incremental Learning Drives Generalization.
Proceedings of the 8th International Conference on Learning Representations, 2020

Distribution Free Learning with Local Queries.
Proceedings of the Algorithmic Learning Theory, 2020

SenseBERT: Driving Some Sense into BERT.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Learning Boolean Circuits with Neural Networks.
CoRR, 2019

SenseBERT: Driving Some Sense into BERT.
CoRR, 2019

Discriminative Active Learning.
CoRR, 2019

Decoupling Gating from Linearity.
CoRR, 2019

Vision Zero: on a Provable Method for Eliminating Roadway Accidents without Compromising Traffic Throughput.
CoRR, 2019

Is Deeper Better only when Shallow is Good?
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018
A Provably Correct Algorithm for Deep Learning that Actually Works.
CoRR, 2018

SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization.
J. Mach. Learn. Res., 2017

On a Formal Model of Safe and Scalable Self-driving Cars.
CoRR, 2017

Weight Sharing is Crucial to Succesful Optimization.
CoRR, 2017

Failures of Deep Learning.
CoRR, 2017

Decoupling "when to update" from "how to update".
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Failures of Gradient-Based Deep Learning.
Proceedings of the 34th International Conference on Machine Learning, 2017

Fast Rates for Empirical Risk Minimization of Strict Saddle Problems.
Proceedings of the 30th Conference on Learning Theory, 2017

Effective Semisupervised Learning on Manifolds.
Proceedings of the 30th Conference on Learning Theory, 2017

2016
Perceptron Algorithm.
Encyclopedia of Algorithms, 2016

Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization.
Math. Program., 2016

Subspace Learning with Partial Information.
J. Mach. Learn. Res., 2016

On Lower and Upper Bounds in Smooth and Strongly Convex Optimization.
J. Mach. Learn. Res., 2016

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving.
CoRR, 2016

On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training.
CoRR, 2016

Long-term Planning by Short-term Prediction.
CoRR, 2016

Faster Low-rank Approximation using Adaptive Gap-based Preconditioning.
CoRR, 2016

Tightening the Sample Complexity of Empirical Risk Minimization via Preconditioned Stability.
CoRR, 2016

Learning a Metric Embedding for Face Recognition using the Multibatch Method.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Minimizing the Maximal Loss: How and Why.
Proceedings of the 33nd International Conference on Machine Learning, 2016

SDCA without Duality, Regularization, and Individual Convexity.
Proceedings of the 33nd International Conference on Machine Learning, 2016

On Graduated Optimization for Stochastic Non-Convex Problems.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Solving Ridge Regression using Sketched Preconditioned SVRG.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Complexity Theoretic Limitations on Learning DNF's.
Proceedings of the 29th Conference on Learning Theory, 2016

2015
Learning sparse low-threshold linear classifiers.
J. Mach. Learn. Res., 2015

Multiclass learnability and the ERM principle.
J. Mach. Learn. Res., 2015

SDCA without Duality.
CoRR, 2015

Faster SGD Using Sketched Conditioning.
CoRR, 2015

On Lower and Upper Bounds for Smooth and Strongly Convex Optimization Problems.
CoRR, 2015

Beyond Convexity: Stochastic Quasi-Convex Optimization.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Strongly Adaptive Online Learning.
Proceedings of the 32nd International Conference on Machine Learning, 2015

2014
Matrix completion with the trace norm: learning, bounding, and transducing.
J. Mach. Learn. Res., 2014

SelfieBoost: A Boosting Algorithm for Deep Learning.
CoRR, 2014

The Sample Complexity of Subspace Learning with Partial Information.
CoRR, 2014

From average case complexity to improper learning complexity.
Proceedings of the Symposium on Theory of Computing, 2014

On the Computational Efficiency of Training Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

K-means recovers ICA filters when independent components are sparse.
Proceedings of the 31th International Conference on Machine Learning, 2014

Optimal learners for multiclass problems.
Proceedings of The 27th Conference on Learning Theory, 2014

The Complexity of Learning Halfspaces using Generalized Linear Methods.
Proceedings of The 27th Conference on Learning Theory, 2014

Understanding Machine Learning - From Theory to Algorithms.
Cambridge University Press, ISBN: 978-1-10-705713-5, 2014

2013
Stochastic dual coordinate ascent methods for regularized loss.
J. Mach. Learn. Res., 2013

Efficient active learning of halfspaces: an aggressive approach.
J. Mach. Learn. Res., 2013

A Provably Efficient Algorithm for Training Deep Networks
CoRR, 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

More data speeds up training time in learning halfspaces over sparse vectors.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Vanishing Component Analysis.
Proceedings of the 30th International Conference on Machine Learning, 2013

Learning Optimally Sparse Support Vector Machines.
Proceedings of the 30th International Conference on Machine Learning, 2013

2012
Using More Data to Speed-up Training Time.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Regularization Techniques for Learning with Matrices.
J. Mach. Learn. Res., 2012

Near-Optimal Algorithms for Online Matrix Prediction.
Proceedings of the COLT 2012, 2012

Online Learning and Online Convex Optimization.
Found. Trends Mach. Learn., 2012

Proximal Stochastic Dual Coordinate Ascent
CoRR, 2012

The error rate of learning halfspaces using Kernel-SVMs
CoRR, 2012

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
CoRR, 2012

Efficient Pool-Based Active Learning of Halfspaces
CoRR, 2012

Multiclass Learning Approaches: A Theoretical Comparison with Implications.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Domain Adaptation--Can Quantity compensate for Quality?.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2012

Learning the Experts for Online Sequence Prediction.
Proceedings of the 29th International Conference on Machine Learning, 2012

The Kernelized Stochastic Batch Perceptron.
Proceedings of the 29th International Conference on Machine Learning, 2012

Learnability beyond Uniform Convergence.
Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

2011
Online Learning of Noisy Data.
IEEE Trans. Inf. Theory, 2011

Learning Kernel-Based Halfspaces with the 0-1 Loss.
SIAM J. Comput., 2011

Pegasos: primal estimated sub-gradient solver for SVM.
Math. Program., 2011

Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing.
Proceedings of the COLT 2011, 2011

Stochastic Methods for <i>l</i><sub>1</sub>-regularized Loss Minimization.
J. Mach. Learn. Res., 2011

Efficient Learning with Partially Observed Attributes.
J. Mach. Learn. Res., 2011

Active Learning Halfspaces under Margin Assumptions
CoRR, 2011

ShareBoost: Efficient multiclass learning with feature sharing.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Learning Linear and Kernel Predictors with the 0-1 Loss Function.
Proceedings of the IJCAI 2011, 2011

Access to Unlabeled Data can Speed up Prediction Time.
Proceedings of the 28th International Conference on Machine Learning, 2011

Large-Scale Convex Minimization with a Low-Rank Constraint.
Proceedings of the 28th International Conference on Machine Learning, 2011

Quantity Makes Quality: Learning with Partial Views.
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2010
Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints.
SIAM J. Optim., 2010

On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms.
Mach. Learn., 2010

Learnability, Stability and Uniform Convergence.
J. Mach. Learn. Res., 2010

Learning Kernel-Based Halfspaces with the Zero-One Loss.
Proceedings of the COLT 2010, 2010

Composite Objective Mirror Descent.
Proceedings of the COLT 2010, 2010

Online Learning of Noisy Data with Kernels.
Proceedings of the COLT 2010, 2010

2009
Individual sequence prediction using memory-efficient context trees.
IEEE Trans. Inf. Theory, 2009

Applications of strong convexity--strong smoothness duality to learning with matrices
CoRR, 2009

Learnability and Stability in the General Learning Setting.
Proceedings of the COLT 2009, 2009

Stochastic Convex Optimization.
Proceedings of the COLT 2009, 2009

The Complexity of Improperly Learning Large Margin Halfspaces.
Proceedings of the COLT 2009, 2009

Agnostic Online Learning.
Proceedings of the COLT 2009, 2009

2008
Perceptron Algorithm.
Proceedings of the Encyclopedia of Algorithms - 2008 Edition, 2008

The Forgetron: A Kernel-Based Perceptron on a Budget.
SIAM J. Comput., 2008

Ranking Categorical Features Using Generalization Properties.
J. Mach. Learn. Res., 2008

Online Learning of Complex Prediction Problems Using Simultaneous Projections.
J. Mach. Learn. Res., 2008

Fast Rates for Regularized Objectives.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Mind the Duality Gap: Logarithmic regret algorithms for online optimization.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

SVM optimization: inverse dependence on training set size.
Proceedings of the Machine Learning, 2008

Efficient bandit algorithms for online multiclass prediction.
Proceedings of the Machine Learning, 2008

Efficient projections onto the <i>l</i><sub>1</sub>-ball for learning in high dimensions.
Proceedings of the Machine Learning, 2008

2007
Online learning: theory, algorithms and applications (למידה מקוונת.).
PhD thesis, 2007

A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment.
IEEE Trans. Speech Audio Process., 2007

A primal-dual perspective of online learning algorithms.
Mach. Learn., 2007

A Unified Algorithmic Approach for Efficient Online Label Ranking.
Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, 2007

Pegasos: Primal Estimated sub-GrAdient SOlver for SVM.
Proceedings of the Machine Learning, 2007

Prediction by Categorical Features: Generalization Properties and Application to Feature Ranking.
Proceedings of the Learning Theory, 20th Annual Conference on Learning Theory, 2007

2006
Efficient Learning of Label Ranking by Soft Projections onto Polyhedra.
J. Mach. Learn. Res., 2006

Online Passive-Aggressive Algorithms.
J. Mach. Learn. Res., 2006

Convex Repeated Games and Fenchel Duality.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Online Classification for Complex Problems Using Simultaneous Projections.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Discriminative kernel-based phoneme sequence recognition.
Proceedings of the INTERSPEECH 2006, 2006

Online multiclass learning by interclass hypothesis sharing.
Proceedings of the Machine Learning, 2006

Online Learning Meets Optimization in the Dual.
Proceedings of the Learning Theory, 19th Annual Conference on Learning Theory, 2006

2005
Smooth epsiloon-Insensitive Regression by Loss Symmetrization.
J. Mach. Learn. Res., 2005

The Forgetron: A Kernel-Based Perceptron on a Fixed Budget.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Phoneme alignment based on discriminative learning.
Proceedings of the INTERSPEECH 2005, 2005

A New Perspective on an Old Perceptron Algorithm.
Proceedings of the Learning Theory, 18th Annual Conference on Learning Theory, 2005

2004
The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Learning to Align Polyphonic Music.
Proceedings of the ISMIR 2004, 2004

Online and batch learning of pseudo-metrics.
Proceedings of the Machine Learning, 2004

2003
Online Passive-Aggressive Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Smooth e-Intensive Regression by Loss Symmetrization.
Proceedings of the Computational Learning Theory and Kernel Machines, 2003

2002
Robust temporal and spectral modeling for query By melody.
Proceedings of the SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002


  Loading...