Shaoyi Huang

Orcid: 0000-0001-6093-9798

According to our database1, Shaoyi Huang authored at least 47 papers between 2021 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Different Prompts, Different Ranks: Prompt-aware Dynamic Rank Selection for SVD-based LLM Compression.
CoRR, May, 2026

Distributed Interpretability and Control for Large Language Models.
CoRR, April, 2026

GSR-GNN: Training Acceleration and Memory-Saving Framework of Deep GNNs on Circuit Graph.
CoRR, March, 2026

Roots Beneath the Cut: Uncovering the Risk of Concept Revival in Pruning-Based Unlearning for Diffusion Models.
CoRR, March, 2026

Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density.
CoRR, February, 2026

Rethinking the Potential of Layer Freezing for DNN Training Efficiency.
Proceedings of the Great Lakes Symposium on VLSI 2026, 2026

Late Breaking Results: Conversion of Neural Networks into Logic Flows for Edge Computing.
Proceedings of the Design, Automation & Test in Europe Conference, 2026

2025
Layer-wise Weight Selection for Power-Efficient Neural Network Acceleration.
CoRR, November, 2025

Towards Fast LLM Fine-tuning through Zeroth-Order Optimization with Projected Gradient-Aligned Perturbations.
CoRR, October, 2025

PEL-NAS: Search Space Partitioned Architecture Prompt Co-Evolutionary LLM-driven Hardware-Aware Neural Architecture Search.
CoRR, October, 2025

Layer-wise dynamic rank for compressing large language models.
CoRR, September, 2025

End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost.
CoRR, September, 2025

Rethinking the Potential of Layer Freezing for Efficient DNN Training.
CoRR, August, 2025

ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning.
CoRR, May, 2025

KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning.
CoRR, May, 2025

DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs.
Proceedings of the 39th ACM International Conference on Supercomputing, 2025

GROOT: Graph Edge Re-growth and Partitioning for the Verification of Large Designs in Logic Synthesis.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2025

Efficient Context Propagating Perceiver Architectures for Auto-Regressive Language Modeling.
Proceedings of the ECAI 2025 - 28th European Conference on Artificial Intelligence, 25-30 October 2025, Bologna, Italy, 2025

2024
ASPLOS 2024 Artifact for "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training".
Dataset, February, 2024

ASPLOS 2024 Artifact for "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training".
Dataset, February, 2024

PromptV: Leveraging LLM-powered Multi-Agent Prompting for High-quality Verilog Generation.
CoRR, 2024

Enhanced Computationally Efficient Long LoRA Inspired Perceiver Architectures for Auto-Regressive Language Modeling.
CoRR, 2024

Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM.
CoRR, 2024

PruneGNN: Algorithm-Architecture Pruning Framework for Graph Neural Network Acceleration.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
MaxK-GNN: Towards Theoretical Speed Limits for Accelerating Graph Neural Networks Training.
CoRR, 2023

RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation Based Private Inference.
CoRR, 2023

LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

AutoReP: Automatic ReLU Replacement for Fast Private Network Inference.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Neurogenesis Dynamics-inspired Spiking Neural Network Training Acceleration.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
Efficient Traffic State Forecasting using Spatio-Temporal Network Dependencies: A Sparse Graph Neural Network Approach.
CoRR, 2022

An Automatic and Efficient BERT Pruning for Edge AI Systems.
Proceedings of the 23rd International Symposium on Quality Electronic Design, 2022

Towards Sparsification of Graph Neural Networks.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

A length adaptive algorithm-hardware co-design of transformer on FPGA through sparse attention and dynamic pipelining.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Analyzing and Defending against Membership Inference Attacks in Natural Language Processing Classification.
Proceedings of the IEEE International Conference on Big Data, 2022

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm.
CoRR, 2021

E.T.: re-thinking self-attention for transformer models on GPUs.
Proceedings of the International Conference for High Performance Computing, 2021

Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning.
Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021

Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Accommodating Transformer onto FPGA: Coupling the Balanced Model Compression and FPGA-Implementation Optimization.
Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021

Co-Exploration of Graph Neural Network and Network-on-Chip Design Using AutoML.
Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021

HMC-TRAN: A Tensor-core Inspired Hierarchical Model Compression for Transformer-based DNNs on GPU.
Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021


  Loading...