Weigao Sun

Orcid: 0000-0003-2551-924X

According to our database1, Weigao Sun authored at least 27 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Comba: Improving Bilinear RNNs with Closed-loop Control.
CoRR, June, 2025

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond.
CoRR, March, 2025

Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts.
CoRR, March, 2025

Liger: Linearizing Large Language Models to Gated Recurrent Structures.
CoRR, March, 2025

MoM: Linear Sequence Modeling with Mixture-of-Memories.
CoRR, February, 2025

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid.
CoRR, February, 2025

MiniMax-01: Scaling Foundation Models with Lightning Attention.
CoRR, January, 2025

LASP: Linear Attention Sequence Parallelism.
Trans. Mach. Learn. Res., 2025

CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Sequence Accumulation and Beyond: Infinite Context Length on Single GPU and Large Clusters.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes.
IEEE Robotics Autom. Lett., January, 2024

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training.
CoRR, 2024

Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective.
CoRR, 2024

HGRN2: Gated Linear RNNs with State Expansion.
CoRR, 2024

Linear Attention Sequence Parallelism.
CoRR, 2024

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models.
CoRR, 2024

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

CO2: Efficient Distributed Training with Full Communication-Computation Overlap.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Scaling Laws for Linear Complexity Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
Scaling TransNormer to 175 Billion Parameters.
CoRR, 2023

2021
The Relationship Between CT Signs of Computed Tomography and Risk Classification of Gastric Stromal Tumors.
J. Medical Imaging Health Informatics, 2021

2020
Data-Driven Probabilistic Optimal Power Flow With Nonparametric Bayesian Modeling and Inference.
IEEE Trans. Smart Grid, 2020

A Fast Optimal Power Flow Algorithm Using Powerball Method.
IEEE Trans. Ind. Informatics, 2020

pbSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

2019
Probabilistic Optimal Power Flow With Correlated Wind Power Uncertainty via Markov Chain Quasi-Monte-Carlo Sampling.
IEEE Trans. Ind. Informatics, 2019

A Nonparametric Bayesian Approach for Probabilistic Representation of Power Uncertainties.
Proceedings of the 2019 IEEE International Conference on Communications, 2019

2018
Probabilistic Optimal Power Flow Considering Correlation of Wind Farms via Markov Chain Quasi-Monte Carlo Sampling.
CoRR, 2018


  Loading...