Lean Wang

According to our database1, Lean Wang authored at least 14 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters.
CoRR, May, 2026

2025
mHC: Manifold-Constrained Hyper-Connections.
CoRR, December, 2025

Unveiling the Role of Learning Rate Schedules via Functional Scaling Laws.
CoRR, September, 2025

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention.
CoRR, February, 2025

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Nat., 2025

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Temporal Reasoning Transfer from Text to Video.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts.
CoRR, 2024

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models.
CoRR, 2024

Towards Codable Watermarking for Injecting Multi-Bits Information to LLMs.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Towards Codable Text Watermarking for Large Language Models.
CoRR, 2023

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
Gradient Knowledge Distillation for Pre-trained Language Models.
CoRR, 2022


  Loading...