Size Zheng

Orcid: 0000-0002-9471-1780

Affiliations:

ByteDance, China
Peking University, Beijing, China (former)

According to our database¹, Size Zheng authored at least 30 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism.

[BibT_eX]

[DOI]

CoRR, November, 2025

SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, June, 2025

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.

[BibT_eX]

[DOI]

CoRR, May, 2025

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler.

[BibT_eX]

[DOI]

CoRR, April, 2025

MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness.

[BibT_eX]

[DOI]

CoRR, March, 2025

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives.

[BibT_eX]

[DOI]

CoRR, March, 2025

Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts.

[BibT_eX]

[DOI]

CoRR, February, 2025

Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

DyREM: Dynamically Mitigating Quantum Readout Error with Embedded Accelerator.

[BibT_eX]

[DOI]

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

QRAMsim: Efficiently Simulating, Analyzing, and Optimizing Large-Scale Quantum Random Access Memory.

[BibT_eX]

[DOI]

Proceedings of the Advanced Parallel Processing Technologies, 2025

2024

Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow Decomposition.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2024

ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving.

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs.

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

SpREM: Exploiting Hamming Sparsity for Fast Quantum Readout Error Mitigation.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

SpecPIM: Accelerating Speculative Inference on PIM-Enabled System via Architecture-Dataflow Co-Exploration.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

ARES: A Mapping Framework of DNNs Towards Diverse PIMs with General Abstractions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC.

[BibT_eX]

[DOI]

Size Zheng

Siyuan Chen

Yun Liang

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Rubick: A Synthesis Framework for Spatial Architectures via Dataflow Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022

NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

2021

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

2020

SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs.

[BibT_eX]

[DOI]

Christopher J. Hughes

Pradeep Dubey

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

Size Zheng

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...