Zhihang Yuan
Orcid: 0000-0001-7846-0240
According to our database1,
Zhihang Yuan
authored at least 69 papers
between 2017 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models.
CoRR, August, 2025
Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance.
CoRR, July, 2025
CoRR, July, 2025
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification.
CoRR, June, 2025
PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling.
CoRR, June, 2025
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing.
CoRR, May, 2025
CoRR, May, 2025
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance.
CoRR, May, 2025
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization.
CoRR, May, 2025
CoRR, April, 2025
DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers.
CoRR, March, 2025
MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization.
CoRR, February, 2025
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting.
CoRR, January, 2025
AIM: Software and Hardware Co-design for Architecture-level IR-drop Mitigation in High-performance PIM.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
2024
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024
Post-training quantization for re-parameterization via coarse & fine weight splitting.
J. Syst. Archit., February, 2024
Neurocomputing, February, 2024
CoRR, 2024
CoRR, 2024
LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization.
CoRR, 2024
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models.
CoRR, 2024
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training.
CoRR, 2024
CoRR, 2024
PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds.
CoRR, 2024
WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More.
CoRR, 2024
CoRR, 2024
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the NeurIPS Efficient Natural Language and Speech Processing Workshop, 2024
Algorithm-Hardware Co-Design for Energy-Efficient A/D Conversion in ReRAM-Based Accelerators.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024
2023
FD-CNN: A Frequency-Domain FPGA Acceleration Scheme for CNN-Based Image-Processing Applications.
ACM Trans. Embed. Comput. Syst., November, 2023
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models.
CoRR, 2023
Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric.
CoRR, 2023
Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance.
CoRR, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Enabling High-Quality Uncertainty Quantification in a PIM Designed for Bayesian Neural Network.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization.
Proceedings of the Computer Vision - ECCV 2022, 2022
Tailor: removing redundant operations in memristive analog neural network accelerators.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
2021
METRO: A Software-Hardware Co-Design of Interconnections for Spatial DNN Accelerators.
CoRR, 2021
NAS4RRAM: neural network architecture search for inference on RRAM-based accelerators.
Sci. China Inf. Sci., 2021
Rapid Configuration of Asynchronous Recurrent Neural Networks for ASIC Implementations.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
Proceedings of the 27th IEEE International Symposium on Asynchronous Circuits and Systems, 2021
2020
Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs.
IEEE Trans. Computers, 2020
CoRR, 2020
S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search.
Proceedings of the Computer Vision - ECCV 2020, 2020
2017
Reducing Overfitting in Deep Convolutional Neural Networks Using Redundancy Regularizer.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2017, 2017
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017
Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators.
Proceedings of the Advanced Parallel Processing Technologies, 2017