Tengyu Ma

Affiliations:
  • Stanford University, CA, USA


According to our database1, Tengyu Ma authored at least 126 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems.
CoRR, 2024

2023
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention.
CoRR, 2023

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time.
CoRR, 2023

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization.
CoRR, 2023

Large Language Models as Tool Makers.
CoRR, 2023

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training.
CoRR, 2023

Toward L<sub>∞</sub>-recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields.
CoRR, 2023

Larger language models do in-context learning differently.
CoRR, 2023

Data Selection for Language Models via Importance Resampling.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models.
Proceedings of the International Conference on Machine Learning, 2023

How Sharpness-Aware Minimization Minimizes Sharpness?
Proceedings of the Eleventh International Conference on Learning Representations, 2023

A theoretical study of inductive biases in contrastive learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

What learning algorithm is in-context learning? Investigations with linear models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Symbol tuning improves in-context learning in language models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Toward L_∞Recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022
On the optimization landscape of tensor decompositions.
Math. Program., 2022

How Does Sharpness-Aware Minimization Minimize Sharpness?
CoRR, 2022

Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation.
Proceedings of the International Conference on Machine Learning, 2022

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification.
Proceedings of the International Conference on Machine Learning, 2022

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path.
Proceedings of the International Conference on Machine Learning, 2022

An Explanation of In-context Learning as Implicit Bayesian Inference.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Self-supervised Learning is More Robust to Dataset Imbalance.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution.
Proceedings of the Tenth International Conference on Learning Representations, 2022

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
Joint System-Wise Optimization for Pipeline Goal-Oriented Dialog System.
CoRR, 2021

Why Do Local Methods Solve Nonconvex Problems?
CoRR, 2021

Entity and Evidence Guided Document-Level Relation Extraction.
Proceedings of the 6th Workshop on Representation Learning for NLP, 2021

Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Safe Reinforcement Learning by Imagining the Near Future.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Label Noise SGD Provably Prefers Flat Global Minimizers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Variance-reduced First-order Meta-learning for Natural Language Processing Tasks.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization.
Proceedings of the 38th International Conference on Machine Learning, 2021

In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness.
Proceedings of the 9th International Conference on Learning Representations, 2021

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data.
Proceedings of the 9th International Conference on Learning Representations, 2021

Optimal Regularization can Mitigate Double Descent.
Proceedings of the 9th International Conference on Learning Representations, 2021

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization.
Proceedings of the 9th International Conference on Learning Representations, 2021

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap.
Proceedings of the Conference on Learning Theory, 2021

Shape Matters: Understanding the Implicit Bias of the Noise Covariance.
Proceedings of the Conference on Learning Theory, 2021

Active Online Learning with Hidden Shifting Domains.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Meta-learning Transferable Representations with a Single Target Domain.
CoRR, 2020

Entity and Evidence Guided Relation Extraction for DocRED.
CoRR, 2020

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK.
CoRR, 2020

Simplifying Models with Unlabeled Output Data.
CoRR, 2020

Active Online Domain Adaptation.
CoRR, 2020

Robust and On-the-fly Dataset Denoising for Image Classification.
CoRR, 2020

Federated Accelerated Stochastic Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

MOPO: Model-based Offline Policy Optimization.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Beyond Lazy Training for Over-parameterized Tensor Decomposition.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Model-based Adversarial Meta-Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Self-training Avoids Using Spurious Features Under Domain Shift.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Individual Calibration with Randomized Forecasting.
Proceedings of the 37th International Conference on Machine Learning, 2020

The Implicit and Explicit Regularization Effects of Dropout.
Proceedings of the 37th International Conference on Machine Learning, 2020

Understanding Self-Training for Gradual Domain Adaptation.
Proceedings of the 37th International Conference on Machine Learning, 2020

On the Expressivity of Neural Networks for Deep Reinforcement Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin.
Proceedings of the 8th International Conference on Learning Representations, 2020

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling.
Proceedings of the 8th International Conference on Learning Representations, 2020

Robust and On-the-Fly Dataset Denoising for Image Classification.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning Over-Parametrized Two-Layer Neural Networks beyond NTK.
Proceedings of the Conference on Learning Theory, 2020

Why Do Local Methods Solve Nonconvex Problems?
Proceedings of the Beyond the Worst-Case Analysis of Algorithms, 2020

2019
Optimal Design of Process Flexibility for General Production Systems.
Oper. Res., 2019

Bootstrapping the Expressivity with Model-based Planning.
CoRR, 2019

Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin.
CoRR, 2019

A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning.
CoRR, 2019

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Verified Uncertainty Calibration.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Fixup Initialization: Residual Learning Without Normalization.
Proceedings of the 7th International Conference on Learning Representations, 2019

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees.
Proceedings of the 7th International Conference on Learning Representations, 2019

Approximability of Discriminators Implies Diversity in GANs.
Proceedings of the 7th International Conference on Learning Representations, 2019

On the Performance of Thompson Sampling on Logistic Bandits.
Proceedings of the Conference on Learning Theory, 2019

2018
Linear Algebraic Structure of Word Senses, with Applications to Polysemy.
Trans. Assoc. Comput. Linguistics, 2018

Gradient Descent Learns Linear Dynamical Systems.
J. Mach. Learn. Res., 2018

On the Margin Theory of Feedforward Neural Networks.
CoRR, 2018

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees.
CoRR, 2018

Generalization and equilibrium in generative adversarial nets (GANs) (invited talk).
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018

Learning One-hidden-layer Neural Networks with Landscape Design.
Proceedings of the 6th International Conference on Learning Representations, 2018

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations.
Proceedings of the Conference On Learning Theory, 2018

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement.
J. Mach. Learn. Res., 2017

Algorithmic Regularization in Over-parameterized Matrix Recovery.
CoRR, 2017

Provable learning of noisy-OR networks.
Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017

Finding approximate local minima faster than gradient descent.
Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017

Generalization and Equilibrium in Generative Adversarial Nets (GANs).
Proceedings of the 34th International Conference on Machine Learning, 2017

Identity Matters in Deep Learning.
Proceedings of the 5th International Conference on Learning Representations, 2017

A Simple but Tough-to-Beat Baseline for Sentence Embeddings.
Proceedings of the 5th International Conference on Learning Representations, 2017

On the Ability of Neural Nets to Express Distributions.
Proceedings of the 30th Conference on Learning Theory, 2017

2016
A Latent Variable Model Approach to PMI-based Word Embeddings.
Trans. Assoc. Comput. Linguistics, 2016

The Simulated Greedy Algorithm for Several Submodular Matroid Secretary Problems.
Theory Comput. Syst., 2016

Finding Approximate Local Minima for Nonconvex Optimization in Linear Time.
CoRR, 2016

Communication lower bounds for statistical estimation problems via a distributed data processing inequality.
Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, 2016

A Non-generative Framework and Convex Relaxations for Unsupervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Matrix Completion has No Spurious Local Minimum.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Provable Algorithms for Inference in Topic Models.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Polynomial-Time Tensor Decompositions with Sum-of-Squares.
Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science, 2016

2015
Distributed Stochastic Variance Reduced Gradient Methods.
CoRR, 2015

Why are deep nets reversible: A simple theory, with implications for training.
CoRR, 2015

Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings.
CoRR, 2015

Sum-of-Squares Lower Bounds for Sparse PCA.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Online Learning of Eigenvectors.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Simple, Efficient, and Neural Algorithms for Sparse Coding.
Proceedings of The 28th Conference on Learning Theory, 2015

Decomposing Overcomplete 3rd Order Tensors using Sum-of-Squares Algorithms.
Proceedings of the Approximation, 2015

2014
Lower Bound for High-Dimensional Statistical Learning Problem via Direct-Sum Theorem.
CoRR, 2014

More Algorithms for Provable Dictionary Learning.
CoRR, 2014

On Communication Cost of Distributed Statistical Estimation and Dimensionality.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Provable Bounds for Learning Some Deep Representations.
Proceedings of the 31th International Conference on Machine Learning, 2014

2013
On a conjecture of Butler and Graham.
Des. Codes Cryptogr., 2013

2011
A New Variation of Hat Guessing Games.
Proceedings of the Computing and Combinatorics - 17th Annual International Conference, 2011


  Loading...