Lei Wang

Orcid: 0009-0006-2313-5348

Affiliations:

Microsoft Research, Beijing, China
Peking University, Beijing, China

According to our database¹, Lei Wang authored at least 14 papers between 2023 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

TileLang: A Composable Tiled Programming Model for AI Systems.

[BibT_eX]

[DOI]

CoRR, April, 2025

AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms.

[BibT_eX]

[DOI]

CoRR, February, 2025

PipeThreader: Software-Defined Pipelining for Efficient DNN Execution.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.

[BibT_eX]

[DOI]

Proceedings of the Twentieth European Conference on Computer Systems, 2025

2024

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.

[BibT_eX]

[DOI]

CoRR, 2024

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.

[BibT_eX]

[DOI]

CoRR, 2024

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.

[BibT_eX]

[DOI]

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

PIMSYN: Synthesizing Processing-in-Memory CNN Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

PrimePar: Efficient Spatial-temporal Tensor Partitioning for Large Transformer Model Training.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

PIMCOMP: A Universal Compilation Framework for Crossbar-based PIM DNN Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Lei Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...