Kaiyue Wen
According to our database1,
Kaiyue Wen
authored at least 21 papers
between 2022 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
CoRR, July, 2025
CoRR, May, 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free.
CoRR, May, 2025
Task Generalization With AutoRegressive Compositional Structure: Can Learning From <i>D</i> Tasks Generalize to <i>D</i><sup>T</sup> Tasks?
CoRR, February, 2025
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
2024
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective.
CoRR, 2024
2023
IACR Cryptol. ePrint Arch., 2023
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
2022
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022