Chen Zhang

Orcid: 0000-0003-2762-2726

Affiliations:
  • Shanghai Jiao Tong University, China
  • Alibaba DAMO Academy, Shanghai, China (former)
  • Microsoft Research Asia (former)
  • Peking University, Center for Energy-Efficient Computing and Applications (CECA), Beijing, China (former)


According to our database1, Chen Zhang authored at least 27 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Amanda: Unified Instrumentation Framework for Deep Neural Networks.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Cambricon-R: A Fully Fused Accelerator for Real-Time Learning of Neural Scene Representation.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks.
Proceedings of the ACM Turing Award Celebration Conference - China 2023, 2023

2022
ROLLER: Fast and Efficient Tensor Compilation for Deep Learning.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

2021
Boosting Mobile CNN Inference through Semantic Memory.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Dual-side Sparse Tensor Core.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

2020
SCYLLA: QoE-aware Continuous Mobile Vision with FPGA-based Dynamic Deep Neural Network Reconfiguration.
Proceedings of the 39th IEEE Conference on Computer Communications, 2020

LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

2019
Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Live Video Analytics with FPGA-based Smart Cameras.
Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, 2019

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Balanced Sparsity for Efficient DNN Inference on GPU.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Best-Effort FPGA Programming: A Few Steps Can Go a Long Way.
CoRR, 2018

2017
Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators.
Proceedings of the Advanced Parallel Processing Technologies, 2017

2016
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016

2015
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

2014
An efficient design and implementation of LSM-tree based key-value store on open-channel SSD.
Proceedings of the Ninth Eurosys Conference 2014, 2014

2013
Automatic multidimensional memory partitioning for FPGA-based accelerators (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Memory partitioning for multidimensional arrays in high-level synthesis.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013


  Loading...