Lei Wang

Orcid: 0009-0006-2313-5348

Affiliations:
  • Microsoft Research, Beijing, China
  • Peking University, Beijing, China


According to our database1, Lei Wang authored at least 14 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning.
CoRR, June, 2025

TileLang: A Composable Tiled Programming Model for AI Systems.
CoRR, April, 2025

AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms.
CoRR, February, 2025

PipeThreader: Software-Defined Pipelining for Efficient DNN Execution.
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.
Proceedings of the Twentieth European Conference on Computer Systems, 2025

2024
LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.
CoRR, 2024

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.
CoRR, 2024

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

PIMSYN: Synthesizing Processing-in-Memory CNN Accelerators.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

PrimePar: Efficient Spatial-temporal Tensor Partitioning for Large Transformer Model Training.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

PIMCOMP: A Universal Compilation Framework for Crossbar-based PIM DNN Accelerators.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023


  Loading...