Ritchie Zhao

Orcid: 0000-0003-1656-9165

According to our database¹, Ritchie Zhao authored at least 30 papers between 2015 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Guess-Verify-Refine: Data-Aware Top-K for Sparse-Attention Decoding on Blackwell via Temporal Correlation.

[BibT_eX]

[DOI]

CoRR, April, 2026

LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens.

[BibT_eX]

[DOI]

CoRR, December, 2025

EMPIRIC: Exploring Missing Pieces in KV Cache Compression for Reducing Computation, Storage, and Latency in Long-Context LLM Inference.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., July, 2025

Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding.

[BibT_eX]

[DOI]

CoRR, July, 2025

Beyond the Buzz: A Pragmatic Take on Inference Disaggregation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Post-Training Quantization for 3D Medical Image Segmentation: A Practical Study on Real Inference Engines.

[BibT_eX]

[DOI]

CoRR, January, 2025

ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

2023

Microscaling Data Formats for Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Shared Microexponents: A Little Shifting Goes a Long Way.

[BibT_eX]

[DOI]

CoRR, 2023

With Shared Microexponents, A Little Shifting Goes a Long Way.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

2020

Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Overwrite Quantization: Opportunistic Outlier Handling for Neural Network Accelerators.

[BibT_eX]

[DOI]

Ritchie Zhao

Christopher De Sa

Zhiru Zhang

CoRR, 2019

A 1.4 GHz 695 Giga Risc-V Inst/s 496-Core Manycore Processor With Mesh On-Chip Network and an All-Digital Synthesized PLL in 16nm CMOS.

[BibT_eX]

[DOI]

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Building Efficient Deep Neural Networks With Unitary Group Convolutions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric: Fast Architectures and Design Methodologies for Fast Chips.

[BibT_eX]

[DOI]

IEEE Micro, 2018

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave.

[BibT_eX]

[DOI]

IEEE Micro, 2018

Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs.

[BibT_eX]

[DOI]

Nitish Kumar Srivastava

Gustavo Angarita Velasquez

Wenping Wang

Zhiru Zhang

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

2017

Architecture and Synthesis for Area-Efficient Pipelining of Irregular Loop Nests.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

A Parallel Bandit-Based Approach for Autotuning FPGA Compilation.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

Enabling adaptive loop pipelining in high-level synthesis.

[BibT_eX]

[DOI]

Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2016

Improving high-level synthesis with decoupled data structure optimization.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual Design Automation Conference, 2016

2015

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

Area-efficient pipelining for FPGA-targeted high-level synthesis.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual Design Automation Conference, 2015

Ritchie Zhao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...