Kaifeng Lyu

According to our database¹, Kaifeng Lyu authored at least 39 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

The Power of Power Law: Asymmetry Enables Compositional Reasoning.

[BibT_eX]

[DOI]

CoRR, April, 2026

SPA: A Simple but Tough-to-Beat Baseline for Knowledge Injection.

[BibT_eX]

[DOI]

CoRR, March, 2026

Fine-tuning MLLMs Without Forgetting Is Easier Than You Think.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice.

[BibT_eX]

[DOI]

CoRR, December, 2025

PCMind-2.1-Kaiyuan-2B Technical Report.

[BibT_eX]

[DOI]

CoRR, December, 2025

How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining.

[BibT_eX]

[DOI]

CoRR, November, 2025

Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression.

[BibT_eX]

[DOI]

CoRR, November, 2025

When Bias Pretends to Be Truth: How Spurious Correlations Undermine Hallucination Detection in LLMs.

[BibT_eX]

[DOI]

CoRR, November, 2025

Shift is Good: Mismatched Data Mixing Improves Test Performance.

[BibT_eX]

[DOI]

CoRR, October, 2025

How Far Are We from Optimal Reasoning Efficiency?

[BibT_eX]

[DOI]

CoRR, June, 2025

LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?

[BibT_eX]

[DOI]

CoRR, March, 2025

Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold.

[BibT_eX]

[DOI]

Xinghan Li

Haodong Wen

Kaifeng Lyu

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Weak-to-Strong Generalization Even in Random Feature Networks, Provably.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval.

[BibT_eX]

[DOI]

Kaiyue Wen

Xingyu Dang

Kaifeng Lyu

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Safety Alignment Should be Made More Than Just a Few Tokens Deep.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficient stagewise pretraining via progressive subnetworks.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

AI-Assisted Generation of Difficult Math Questions.

[BibT_eX]

[DOI]

CoRR, 2024

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DistillSpec: Improving Speculative Decoding via Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The Marginal Value of Momentum for Small Learning Rate SGD.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

A Quadratic Synchronization Rule for Distributed Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Why (and When) does Local SGD Generalize Better than SGD?

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction.

[BibT_eX]

[DOI]

Kaifeng Lyu

Zhiyuan Li

Sanjeev Arora

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning.

[BibT_eX]

[DOI]

Zhiyuan Li

Yuping Luo

Kaifeng Lyu

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate.

[BibT_eX]

[DOI]

Zhiyuan Li

Kaifeng Lyu

Sanjeev Arora

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks.

[BibT_eX]

[DOI]

Kaifeng Lyu

Jian Li

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Fine-grained Complexity Meets IP = PSPACE.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, 2019

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization.

[BibT_eX]

[DOI]

Sanjeev Arora

Zhiyuan Li

Kaifeng Lyu

Proceedings of the 7th International Conference on Learning Representations, 2019

2018

Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs.

[BibT_eX]

[DOI]

CoRR, 2018

Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs.

[BibT_eX]

[DOI]

Ran Duan

Kaifeng Lyu

Yuanhang Xie

Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, 2018

Kaifeng Lyu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...