Kaifeng Lyu

According to our database1, Kaifeng Lyu authored at least 20 papers between 2018 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates.
CoRR, 2024

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval.
CoRR, 2024

Efficient Stagewise Pretraining via Progressive Subnetworks.
CoRR, 2024

2023
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking.
CoRR, 2023

A Quadratic Synchronization Rule for Distributed Deep Learning.
CoRR, 2023

DistillSpec: Improving Speculative Decoding via Knowledge Distillation.
CoRR, 2023

The Marginal Value of Momentum for Small Learning Rate SGD.
CoRR, 2023

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing.
Proceedings of the International Conference on Machine Learning, 2023

Why (and When) does Local SGD Generalize Better than SGD?
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Fine-grained Complexity Meets IP = PSPACE.
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, 2019

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs.
CoRR, 2018

Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs.
Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, 2018


  Loading...