Zhihang Yuan

Orcid: 0000-0001-7846-0240

According to our database¹, Zhihang Yuan authored at least 74 papers between 2017 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats.

[BibT_eX]

[DOI]

CoRR, October, 2025

LANPO: Bootstrapping Language and Numerical Feedback for Reinforcement Learning in LLMs.

[BibT_eX]

[DOI]

CoRR, October, 2025

Know When to Explore: Difficulty-Aware Certainty as a Guide for LLM Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance.

[BibT_eX]

[DOI]

CoRR, July, 2025

EA-ViT: Efficient Adaptation for Elastic Vision Transformer.

[BibT_eX]

[DOI]

CoRR, July, 2025

SplitMeanFlow: Interval Splitting Consistency in Few-Step Generative Modeling.

[BibT_eX]

[DOI]

CoRR, July, 2025

SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification.

[BibT_eX]

[DOI]

CoRR, June, 2025

PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling.

[BibT_eX]

[DOI]

CoRR, June, 2025

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing.

[BibT_eX]

[DOI]

CoRR, May, 2025

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design.

[BibT_eX]

[DOI]

CoRR, May, 2025

MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance.

[BibT_eX]

[DOI]

CoRR, May, 2025

RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization.

[BibT_eX]

[DOI]

CoRR, May, 2025

VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate.

[BibT_eX]

[DOI]

CoRR, April, 2025

DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, March, 2025

DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation.

[BibT_eX]

[DOI]

CoRR, February, 2025

MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization.

[BibT_eX]

[DOI]

CoRR, February, 2025

OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting.

[BibT_eX]

[DOI]

CoRR, January, 2025

AIM: Software and Hardware Co-design for Architecture-level IR-drop Mitigation in High-performance PIM.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Bidirectional Multitask Learning for Non-Autoregressive Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2025

MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Latency-Aware Unified Dynamic Networks for Efficient Image Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Post-training quantization for re-parameterization via coarse & fine weight splitting.

[BibT_eX]

[DOI]

J. Syst. Archit., February, 2024

Stabilized activation scale estimation for precise Post-Training Quantization.

[BibT_eX]

[DOI]

Neurocomputing, February, 2024

E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues.

[BibT_eX]

[DOI]

CoRR, 2024

LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization.

[BibT_eX]

[DOI]

CoRR, 2024

DiTFastAttn: Attention Compression for Diffusion Transformer Models.

[BibT_eX]

[DOI]

CoRR, 2024

I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training.

[BibT_eX]

[DOI]

CoRR, 2024

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

A Survey on Efficient Inference for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds.

[BibT_eX]

[DOI]

Weisheng Xu

Sifan Zhou

Zhihang Yuan

CoRR, 2024

LLM Inference Unveiled: Survey and Roofline Model Insights.

[BibT_eX]

[DOI]

CoRR, 2024

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More.

[BibT_eX]

[DOI]

CoRR, 2024

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning.

[BibT_eX]

[DOI]

CoRR, 2024

A Force Control Method of Medical Robot Based on Impedance Control.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2024

DiTFastAttn: Attention Compression for Diffusion Transformer Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2024

Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

PB-LLM: Partially Binarized Large Language Models.

[BibT_eX]

[DOI]

Zhihang Yuan

Yuzhang Shang

Zhen Dong

Proceedings of the Twelfth International Conference on Learning Representations, 2024

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios.

[BibT_eX]

[DOI]

Proceedings of the NeurIPS Efficient Natural Language and Speech Processing Workshop, 2024

Algorithm-Hardware Co-Design for Energy-Efficient A/D Conversion in ReRAM-Based Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

2023

FD-CNN: A Frequency-Domain FPGA Acceleration Scheme for CNN-Based Image-Processing Applications.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., November, 2023

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

PB-LLM: Partially Binarized Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric.

[BibT_eX]

[DOI]

CoRR, 2023

RPTQ: Reorder-based Post-training Quantization for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance.

[BibT_eX]

[DOI]

CoRR, 2023

MIM4DD: Mutual Information Maximization for Dataset Distillation.

[BibT_eX]

[DOI]

Yuzhang Shang

Zhihang Yuan

Yan Yan

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Post-Training Quantization on Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PD-Quant: Post-Training Quantization Based on Prediction Difference Metric.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Flatfish: A Reinforcement Learning Approach for Application-Aware Address Mapping.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Latency-aware Spatial-wise Dynamic Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Enabling High-Quality Uncertainty Quantification in a PIM Designed for Bayesian Neural Network.

[BibT_eX]

[DOI]

Meng-Fan Marvin Chang

Tianchan Guan

Xin Si

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Tailor: removing redundant operations in memristive analog neural network accelerators.

[BibT_eX]

[DOI]

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021

PTQ4ViT: Post-Training Quantization Framework for Vision Transformers.

[BibT_eX]

[DOI]

CoRR, 2021

PTQ-SL: Exploring the Sub-layerwise Post-training Quantization.

[BibT_eX]

[DOI]

CoRR, 2021

METRO: A Software-Hardware Co-Design of Interconnections for Spatial DNN Accelerators.

[BibT_eX]

[DOI]

CoRR, 2021

NAS4RRAM: neural network architecture search for inference on RRAM-based accelerators.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2021

Rapid Configuration of Asynchronous Recurrent Neural Networks for ASIC Implementations.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Reconfigurable ASIC Implementation of Asynchronous Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Asynchronous Circuits and Systems, 2021

2020

Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2020

ENAS4D: Efficient Multi-stage CNN Architecture Search for Dynamic Inference.

[BibT_eX]

[DOI]

CoRR, 2020

S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2017

Reducing Overfitting in Deep Convolutional Neural Networks Using Redundancy Regularizer.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2017, 2017

FPGA-based accelerator for long short-term memory recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Advanced Parallel Processing Technologies, 2017

Zhihang Yuan

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...