Etai Littwin

According to our database1, Etai Littwin authored at least 29 papers between 2015 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
The Slingshot Effect: A Late-Stage Optimization Anomaly in Adaptive Gradient Methods.
Trans. Mach. Learn. Res., 2024

How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks.
CoRR, 2024

What Algorithms can Transformers Learn? A Study in Length Generalization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Vanishing Gradients in Reinforcement Finetuning of Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

When can transformers reason with abstract symbols?
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Tight conditions for when the NTK approximation is valid.
Trans. Mach. Learn. Res., 2023

Adaptivity and Modularity for Efficient Generalization Over Task Complexity.
CoRR, 2023

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit.
CoRR, 2023

The NTK approximation is valid for longer than you think.
CoRR, 2023

Transformers learn through gradual rank increase.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Stabilizing Transformer Training by Preventing Attention Entropy Collapse.
Proceedings of the International Conference on Machine Learning, 2023

Adaptive Optimization in the ∞-Width Limit.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon.
CoRR, 2022

Learning Representation from Neural Fisher Kernel with Low-rank Approximation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks.
CoRR, 2021

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks.
CoRR, 2021

On random kernels of residual architectures.
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

Tensor Programs IIb: Architectural Universality Of Neural Tangent Kernel Training Dynamics.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
On the Optimization Dynamics of Wide Hypernetworks.
CoRR, 2020

Residual Tangent Kernels.
CoRR, 2020

On the Convex Behavior of Deep Neural Networks in Relation to the Layers' Width.
CoRR, 2020

Collegial Ensembles.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

On Infinite-Width Hypernetworks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2018
Regularizing by the Variance of the Activations' Sample-Variances.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2016
The Loss Surface of Residual Networks: Ensembles and the Role of Batch Normalization.
CoRR, 2016

Complexity of multiverse networks and their multilayer generalization.
Proceedings of the 23rd International Conference on Pattern Recognition, 2016

The Multiverse Loss for Robust Transfer Learning.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Spherical embedding of inlier silhouette dissimilarities.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015


  Loading...