Chen Zhang

Orcid: 0000-0003-2762-2726

Affiliations:

Shanghai Jiao Tong University, China
Alibaba DAMO Academy, Shanghai, China (former)
Microsoft Research Asia (former)
Peking University, Center for Energy-Efficient Computing and Applications (CECA), Beijing, China (former)

According to our database¹, Chen Zhang authored at least 73 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

HiRe: A Hierarchical Reconfigurable Architecture for Large-Scale Multichiplet DNN Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., June, 2026

MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems.

[BibT_eX]

[DOI]

CoRR, May, 2026

Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems.

[BibT_eX]

[DOI]

CoRR, May, 2026

Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs.

[BibT_eX]

[DOI]

CoRR, May, 2026

TOM: A Ternary Read-only Memory Accelerator for LLM-powered Edge Intelligence.

[BibT_eX]

[DOI]

CoRR, February, 2026

M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization.

[BibT_eX]

[DOI]

CoRR, January, 2026

Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

M<sup>2</sup>XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 31st Asia and South Pacific Design Automation Conference, 2026

2025

Theseus: Exploring Efficient Wafer-Scale Chip Design for Large Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2025

ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive.

[BibT_eX]

[DOI]

CoRR, August, 2025

Fine-Grained Structured Sparse Computing for FPGA-Based AI Inference.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2025

ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis.

[BibT_eX]

[DOI]

CoRR, July, 2025

Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization.

[BibT_eX]

[DOI]

CoRR, June, 2025

Scaling Laws for Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, May, 2025

DiffuSE: Cross-Layer Design Space Exploration of DNN Accelerator via Diffusion-Driven Optimization.

[BibT_eX]

[DOI]

CoRR, March, 2025

DOMAC: Differentiable Optimization for High-Speed Multipliers and Multiply-Accumulators.

[BibT_eX]

[DOI]

CoRR, March, 2025

DSTC: Dual-Side Sparse Tensor Core for DNNs Acceleration on Modern GPU Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Computers, February, 2025

Data and System Perspectives of Sustainable Artificial Intelligence.

[BibT_eX]

[DOI]

CoRR, January, 2025

DWCLF-Net: A weighted contrastive learning feature fusion network for temporal scar image sequence classification.

[BibT_eX]

[DOI]

Biomed. Signal Process. Control., 2025

Jenga: Effective Memory Management for Serving LLM with Heterogeneity.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

H<sup>2</sup>-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

TB-STC: Transposable Block-wise N: M Structured Sparse Tensor Core.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

OutlierCIM: Outlier-Aware Digital CIM-Based LLM Accelerator with Hybrid-Strategy Quantization and Unified FP-INT Computation.

[BibT_eX]

[DOI]

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

SynGPU: Synergizing CUDA and Bit-Serial Tensor Cores for Vision Transformer Acceleration on GPU.

[BibT_eX]

[DOI]

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

MHDiff: Memory- and Hardware-Efficient Diffusion Acceleration via Focal Pixel Aware Quantization.

[BibT_eX]

[DOI]

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

IntelliGen: Instruction-Level Auto-tuning for Tensor Program with Monotonic Memory Optimization.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization, 2025

2024

Graph-Centric Performance Analysis for Large-Scale Parallel Applications.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., July, 2024

MAGPY: Compiling Eager Mode DNN Programs by Monitoring Execution States.

[BibT_eX]

[DOI]

Proceedings of the 2024 USENIX Annual Technical Conference, 2024

MixQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

Oltron: Algorithm-Hardware Co-design for Outlier-Aware Quantization of LLMs with Inter-/Intra-Layer Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Amanda: Unified Instrumentation Framework for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR.

[BibT_eX]

[DOI]

CoRR, 2023

Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Cambricon-R: A Fully Fused Accelerator for Real-Time Learning of Neural Scene Representation.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the ACM Turing Award Celebration Conference - China 2023, 2023

2022

Critique of "MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization" by SCC Team From Tsinghua University.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

UniQ: A Unified Programming Model for Efficient Quantum Circuit Simulation.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

PerFlow: a domain specific framework for automatic performance analysis of parallel applications.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs.

[BibT_eX]

[DOI]

Proceedings of the PLDI '22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13, 2022

ROLLER: Fast and Efficient Tensor Compilation for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Calibration of the Multiple Choice Machine Reading Comprehension.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2022

Efficiently emulating high-bitwidth computation with low-bitwidth hardware.

[BibT_eX]

[DOI]

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE 40th International Conference on Computer Design, 2022

2021

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From Tsinghua University.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

A Fast Lock for Explicit Message Passing Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2021

Eden: A Unified Environment Framework for Booming Reinforcement Learning Algorithms.

[BibT_eX]

[DOI]

CoRR, 2021

Boosting Mobile CNN Inference through Semantic Memory.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Dual-side Sparse Tensor Core.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

HyQuas: hybrid partitioner based quantum circuit simulation system on GPU.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2020

Deeper Insights into Weight Sharing in Neural Architecture Search.

[BibT_eX]

[DOI]

CoRR, 2020

SCYLLA: QoE-aware Continuous Mobile Vision with FPGA-based Dynamic Deep Neural Network Reconfiguration.

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE Conference on Computer Communications, 2020

LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Computational Linguistics, 2020

2019

Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Live Video Analytics with FPGA-based Smart Cameras.

[BibT_eX]

[DOI]

Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, 2019

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Balanced Sparsity for Efficient DNN Inference on GPU.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Best-Effort FPGA Programming: A Few Steps Can Go a Long Way.

[BibT_eX]

[DOI]

CoRR, 2018

2017

Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Advanced Parallel Processing Technologies, 2017

2016

Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Computer-Aided Design, 2016

2015

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

2014

An efficient design and implementation of LSM-tree based key-value store on open-channel SSD.

[BibT_eX]

[DOI]

Proceedings of the Ninth Eurosys Conference 2014, 2014

2013

Automatic multidimensional memory partitioning for FPGA-based accelerators (abstract only).

[BibT_eX]

[DOI]

Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Memory partitioning for multidimensional arrays in high-level synthesis.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Chen Zhang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...