Ritchie Zhao
Orcid: 0000-0003-1656-9165
According to our database1,
Ritchie Zhao
authored at least 27 papers
between 2015 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
EMPIRIC: Exploring Missing Pieces in KV Cache Compression for Reducing Computation, Storage, and Latency in Long-Context LLM Inference.
ACM SIGOPS Oper. Syst. Rev., July, 2025
Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding.
CoRR, July, 2025
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression.
CoRR, February, 2025
Post-Training Quantization for 3D Medical Image Segmentation: A Practical Study on Real Inference Engines.
CoRR, January, 2025
ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025
2023
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
2020
Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations.
Proceedings of the 8th International Conference on Learning Representations, 2020
2019
Overwrite Quantization: Opportunistic Outlier Handling for Neural Network Accelerators.
CoRR, 2019
A 1.4 GHz 695 Giga Risc-V Inst/s 496-Core Manycore Processor With Mesh On-Chip Network and an All-Digital Synthesized PLL in 16nm CMOS.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting.
Proceedings of the 36th International Conference on Machine Learning, 2019
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2018
The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric: Fast Architectures and Design Methodologies for Fast Chips.
IEEE Micro, 2018
IEEE Micro, 2018
Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018
2017
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017
Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017
2016
Proceedings of the 53rd Annual Design Automation Conference, 2016
2015
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015
Proceedings of the 52nd Annual Design Automation Conference, 2015