Kaifeng Lyu

According to our database1, Kaifeng Lyu authored at least 39 papers between 2018 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
The Power of Power Law: Asymmetry Enables Compositional Reasoning.
CoRR, April, 2026

SPA: A Simple but Tough-to-Beat Baseline for Knowledge Injection.
CoRR, March, 2026

Fine-tuning MLLMs Without Forgetting Is Easier Than You Think.
CoRR, March, 2026

2025
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice.
CoRR, December, 2025

PCMind-2.1-Kaiyuan-2B Technical Report.
CoRR, December, 2025

How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining.
CoRR, November, 2025

Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression.
CoRR, November, 2025

When Bias Pretends to Be Truth: How Spurious Correlations Undermine Hallucination Detection in LLMs.
CoRR, November, 2025

Shift is Good: Mismatched Data Mixing Improves Test Performance.
CoRR, October, 2025

How Far Are We from Optimal Reasoning Efficiency?
CoRR, June, 2025

LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
CoRR, March, 2025

Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Weak-to-Strong Generalization Even in Random Feature Networks, Provably.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Safety Alignment Should be Made More Than Just a Few Tokens Deep.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficient stagewise pretraining via progressive subnetworks.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
AI-Assisted Generation of Difficult Math Questions.
CoRR, 2024

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DistillSpec: Improving Speculative Decoding via Knowledge Distillation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

The Marginal Value of Momentum for Small Learning Rate SGD.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

A Quadratic Synchronization Rule for Distributed Deep Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing.
Proceedings of the International Conference on Machine Learning, 2023

Why (and When) does Local SGD Generalize Better than SGD?
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Fine-grained Complexity Meets IP = PSPACE.
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, 2019

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs.
CoRR, 2018

Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs.
Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, 2018


  Loading...