Lean Wang

According to our database1, Lean Wang authored at least 12 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Unveiling the Role of Learning Rate Schedules via Functional Scaling Laws.
CoRR, September, 2025

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos.
CoRR, April, 2025

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention.
CoRR, February, 2025

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Nat., 2025

Temporal Reasoning Transfer from Text to Video.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts.
CoRR, 2024

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models.
CoRR, 2024

Towards Codable Watermarking for Injecting Multi-Bits Information to LLMs.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Towards Codable Text Watermarking for Large Language Models.
CoRR, 2023

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
Gradient Knowledge Distillation for Pre-trained Language Models.
CoRR, 2022


  Loading...