Léon Bottou

CoRR, 2024

2023

Borges and AI.

[BibT_eX]

[DOI]

Bernhard Schölkopf

CoRR, 2023

Birth of a Transformer: A Memory Viewpoint.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning useful representations for shifting tasks and distributions.

[BibT_eX]

[DOI]

Jianyu Zhang

Proceedings of the International Conference on Machine Learning, 2023

Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

A Simple Convergence Proof of Adam and Adagrad.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2022

A scaling calculus for the design and initialization of ReLU networks.

[BibT_eX]

[DOI]

Neural Comput. Appl., 2022

Recycling diverse models for out-of-distribution generalization.

[BibT_eX]

[DOI]

CoRR, 2022

The Effects of Regularization and Data Augmentation are Class Dependent.

[BibT_eX]

[DOI]

Randall Balestriero

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Rich Feature Construction for the Optimization-Generalization Dilemma.

[BibT_eX]

[DOI]

Jianyu Zhang

David Lopez-Paz

Proceedings of the International Conference on Machine Learning, 2022

On Distributionally Robust Optimization and Data Rebalancing.

[BibT_eX]

[DOI]

Agnieszka Slowik

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

On the Relation between Distributionally Robust Optimization and Data Curation (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

An Attract-Repel Decomposition of Undirected Networks.

[BibT_eX]

[DOI]

Alexander Peysakhovich

CoRR, 2021

Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation.

[BibT_eX]

[DOI]

Agnieszka Slowik

CoRR, 2021

Linear unit-tests for invariance discovery.

[BibT_eX]

[DOI]

CoRR, 2021

2020

On the Convergence of Adam and Adagrad.

[BibT_eX]

[DOI]

CoRR, 2020

Symplectic Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Learning Representations Using Causal Invariance.

[BibT_eX]

[DOI]

Proceedings of the Extraction et Gestion des Connaissances, 2020

2019

Music Source Separation in the Waveform Domain.

[BibT_eX]

[DOI]

CoRR, 2019

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed.

[BibT_eX]

[DOI]

CoRR, 2019

Invariant Risk Minimization.

[BibT_eX]

[DOI]

CoRR, 2019

Scaling Laws for the Principled Design, Initialization and Preconditioning of ReLU Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Cold Case: The Lost MNIST Digits.

[BibT_eX]

[DOI]

Chhavi Yadav

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

AdaGrad stepsizes: sharp convergence over nonconvex landscapes.

[BibT_eX]

[DOI]

Rachel A. Ward

Xiaoxia Wu

Carl-Johann Simon-Gabriel

Proceedings of the 36th International Conference on Machine Learning, 2019

First-Order Adversarial Vulnerability of Neural Networks and Input Dimension.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

2018

Optimization Methods for Large-Scale Machine Learning.

[BibT_eX]

[DOI]

Sundararajan Sellamanickam

Frank E. Curtis

Jorge Nocedal

SIAM Rev., 2018

An efficient distributed learning algorithm based on effective local functional approximations.

[BibT_eX]

[DOI]

Dhruv Mahajan

Nikunj Agrawal

S. Sathiya Keerthi

J. Mach. Learn. Res., 2018

Controlling Covariate Shift using Equilibrium Normalization of Weights.

[BibT_eX]

[DOI]

CoRR, 2018

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization.

[BibT_eX]

[DOI]

Rachel A. Ward

Xiaoxia Wu

CoRR, 2018

WNGrad: Learn the Learning Rate in Gradient Descent.

[BibT_eX]

[DOI]

Xiaoxia Wu

Rachel A. Ward

Carl-Johann Simon-Gabriel

CoRR, 2018

Adversarial Vulnerability of Neural Networks Increases With Input Dimension.

[BibT_eX]

[DOI]

CoRR, 2018

SING: Symbol-to-Instrument Neural Generator.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Diagonal Rescaling For Neural Networks.

[BibT_eX]

[DOI]

Jean Lafond

Nicolas Vasilache

CoRR, 2017

Wasserstein GAN.

[BibT_eX]

[DOI]

Martín Arjovsky

Soumith Chintala

CoRR, 2017

Wasserstein Generative Adversarial Networks.

[BibT_eX]

[DOI]

Martín Arjovsky

Soumith Chintala

Proceedings of the 34th International Conference on Machine Learning, 2017

Towards Principled Methods for Training Generative Adversarial Networks.

[BibT_eX]

[DOI]

Martín Arjovsky

Proceedings of the 5th International Conference on Learning Representations, 2017

Discovering Causal Signals in Images.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Geometrical Insights for Implicit Generative Modeling.

[BibT_eX]

[DOI]

Proceedings of the Braverman Readings in Machine Learning. Key Ideas from Inception to Current State, 2017

2016

Singularity of the Hessian in Deep Learning.

[BibT_eX]

[DOI]

Levent Sagun

CoRR, 2016

Unifying distillation and privileged information.

[BibT_eX]

[DOI]

Proceedings of the 4th International Conference on Learning Representations, 2016

No Regret Bound for Extreme Bandits.

[BibT_eX]

[DOI]

Robert Nishihara

David Lopez-Paz

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015

A Lower Bound for the Optimization of Finite Sums.

[BibT_eX]

[DOI]

Alekh Agarwal

Proceedings of the 32nd International Conference on Machine Learning, 2015

Is object localization for free? - Weakly-supervised learning with convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

How big data changes statistical machine learning.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014

From machine learning to machine reasoning - An essay.

[BibT_eX]

[DOI]

Mach. Learn., 2014

Introduction to the special issue on learning semantics.

[BibT_eX]

[DOI]

Mach. Learn., 2014

ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems.

[BibT_eX]

[DOI]

Patrice Y. Simard

David Maxwell Chickering

Aparna Lakshmiratan

Denis Xavier Charles

Carlos Garcia Jurado Suarez

CoRR, 2014

Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics.

[BibT_eX]

[DOI]

Douwe Kiela

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013

Counterfactual reasoning and learning systems: the example of computational advertising.

[BibT_eX]

[DOI]

Joaquin Quiñonero Candela

Jonas Peters

J. Mach. Learn. Res., 2013

A Parallel SGD method with Strong Convergence.

[BibT_eX]

[DOI]

CoRR, 2013

A Functional Approximation Based Distributed Learning Algorithm.

[BibT_eX]

[DOI]

CoRR, 2013

Para-active learning.

[BibT_eX]

[DOI]

CoRR, 2013

In Hindsight: Doklady Akademii Nauk SSSR, 181(4), 1968.

[BibT_eX]

[DOI]

Proceedings of the Empirical Inference - Festschrift in Honor of Vladimir N. Vapnik, 2013

2012

Efficient BackProp.

[BibT_eX]

[DOI]

Proceedings of the Neural Networks: Tricks of the Trade - Second Edition, 2012

Stochastic Gradient Descent Tricks.

[BibT_eX]

[DOI]

Proceedings of the Neural Networks: Tricks of the Trade - Second Edition, 2012

Counterfactual Reasoning and Learning Systems

[BibT_eX]

[DOI]

Joaquin Quiñonero Candela

Jonas Peters

CoRR, 2012

2011

Batch and online learning algorithms for nonconvex neyman-pearson classification.

[BibT_eX]

[DOI]

Gilles Gasso

Aristidis Pappaioannou

Marina Spivak

ACM Trans. Intell. Syst. Technol., 2011

Nonconvex Online Support Vector Machines.

[BibT_eX]

[DOI]

Seyda Ertekin

C. Lee Giles

IEEE Trans. Pattern Anal. Mach. Intell., 2011

Natural Language Processing (Almost) from Scratch.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2011

From Machine Learning to Machine Reasoning

[BibT_eX]

[DOI]

CoRR, 2011

2010

L'apprentissage statistique à grande échelle.

[BibT_eX]

[DOI]

Olivier Bousquet

Monde des Util. Anal. Données, 2010

Guarantees for Approximate Incremental SVMs.

[BibT_eX]

[DOI]

Nicolas Usunier

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Erratum: SGDQN is Less Careful than Expected.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2010

Large-Scale Machine Learning with Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Computational Statistics, 2010

2009

SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Patrick Gallinari

J. Mach. Learn. Res., 2009

2008

Sequence Labelling SVMs Trained in One Pass.

[BibT_eX]

[DOI]

Nicolas Usunier

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2008

2007

The Need for Open Source Software in Machine Learning.

[BibT_eX]

[DOI]

Carl Edward Rasmussen

J. Mach. Learn. Res., 2007

The Tradeoffs of Large Scale Learning.

[BibT_eX]

[DOI]

Olivier Bousquet

Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Learning using Large Datasets.

[BibT_eX]

[DOI]

Olivier Bousquet

Proceedings of the Mining Massive Data Sets for Security, 2007

Solving multiclass support vector machines with LaRank.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2007

Learning on the border: active learning in imbalanced data classification.

[BibT_eX]

[DOI]

Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, 2007

2006

Large Scale Transductive SVMs.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2006

Inference with the Universum.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2006

Trading convexity for scalability.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2006

2005

Toward Automatic Phenotyping of Developing Embryos From Videos.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2005

Fast Kernel Classifiers with Online and Active Learning.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2005

The Huller: A Simple and Efficient Online SVM.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning: ECML 2005, 2005

Online (and Offline) on an Even Tighter Budget.

[BibT_eX]

[DOI]

Jason Weston

Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 2005

2004

Parallel Support Vector Machines: The Cascade SVM.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Breaking SVM Complexity with Cross-Training.

[BibT_eX]

[DOI]

Gökhan H. Bakir

Jason Weston

Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting.

[BibT_eX]

[DOI]

Fu Jie Huang

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June, 2004

2003

Scalable video coding with managed drift.

[BibT_eX]

[DOI]

Amy R. Reibman

Andrea Basso

IEEE Trans. Circuits Syst. Video Technol., 2003

Geometric Clustering Using the Information Bottleneck Method.

[BibT_eX]

[DOI]

Susanne Still

William Bialek

Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Large Scale Online Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Stochastic Learning.

[BibT_eX]

[DOI]

Proceedings of the Advanced Lectures on Machine Learning, 2003

2002

Electronic Document Publishing Using DjVu.

[BibT_eX]

[DOI]

Proceedings of the Document Analysis Systems V, 5th International Workshop, 2002

2001

DCT-based scalable video coding with drift.

[BibT_eX]

[DOI]

Amy R. Reibman

Andrea Basso

Proceedings of the 2001 International Conference on Image Processing, 2001

Efficient Conversion of Digital Documents to Multilayer Raster Formats.

[BibT_eX]

[DOI]

Patrick Haffner

Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR 2001), 2001

Managing Drift in DCT-Based Scalable Video Coding.

[BibT_eX]

[DOI]

Amy R. Reibman

Proceedings of the Data Compression Conference, 2001

Masked Wavelets: Applications to Image Compression.

[BibT_eX]

[DOI]

Steven Pigeon

Proceedings of the Data Compression Conference, 2001

2000

Vicinal Risk Minimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 13, 2000

1999

Object Recognition with Gradient-Based Learning.

[BibT_eX]

[DOI]

Proceedings of the Shape, Contour and Grouping in Computer Vision, 1999

Color Documents on the Web with DJVU.

[BibT_eX]

[DOI]

Proceedings of the 1999 International Conference on Image Processing, 1999

DjVu: Analyzing and Compressing Scanned Documents for Internet Distribution.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999

1998

Image and video coding-emerging standards and beyond.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 1998

Gradient-based learning applied to document recognition.

[BibT_eX]

[DOI]

Proc. IEEE, 1998

High quality document image compression with "DjVu".

[BibT_eX]

[DOI]

J. Electronic Imaging, 1998

Boxlets: A Fast Convolution Algorithm for Signal Processing and Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

DjVu: a Compression Method for Distributing Scanned Documents in Color over the Internet.

[BibT_eX]

[DOI]

Proceedings of the 6th Color and Imaging Conference, 1998

Lossy Compression of Partially Masked Still Images.

[BibT_eX]

[DOI]

Steven Pigeon

Proceedings of the Data Compression Conference, 1998

The Z-Coder Adaptive Binary Coder.

[BibT_eX]

[DOI]

Paul G. Howard

Proceedings of the Data Compression Conference, 1998

Browsing through High Quality Document Images with DjVu.

[BibT_eX]

[DOI]

Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries, 1998

1997

Reading checks with multilayer graph transformer networks.

[BibT_eX]

[DOI]

Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

Global Training of Document Processing Systems Using Graph Transformer Networks.

[BibT_eX]

[DOI]

Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97), 1997

1996

Effiicient BackProp.

[BibT_eX]

[DOI]

Proceedings of the Neural Networks: Tricks of the Trade, 1996

1994

Convergence Properties of the K-Means Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 7, 1994

Comparison of classifier methods: a case study in handwritten digit recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th IAPR International Conference on Pattern Recognition, 1994

1993

Local Algorithms for Pattern Recognition and Dependencies Estimation.

[BibT_eX]

[DOI]

Vladimir Vapnik

Neural Comput., 1993

Signature Verification Using A "Siamese" Time Delay Neural Network.

[BibT_eX]

[DOI]

Int. J. Pattern Recognit. Artif. Intell., 1993

1992

Local Learning Algorithms.

[BibT_eX]

[DOI]

Vladimir Vapnik

Neural Comput., 1992

Computer aided cleaning of large databases for character recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th IAPR International Conference on Pattern Recognition, 1992

Capacity control in linear classifiers for pattern recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th IAPR International Conference on Pattern Recognition, 1992

1991

Structural Risk Minimization for Character Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 4, 1991

1990

Speaker-independent isolated digit recognition: Multilayer perceptrons vs. Dynamic time warping.

[BibT_eX]

[DOI]

Françoise Fogelman-Soulié

Pascal Blanchet

Jean-Sylvain Liénard

Neural Networks, 1990

A Framework for the Cooperation of Learning Algorithms.

[BibT_eX]

[DOI]

Patrick Gallinari

Proceedings of the Advances in Neural Information Processing Systems 3, 1990

1989

Experiments with time delay networks and dynamic time warping for speaker independent isolated digits recognition.

[BibT_eX]

[DOI]