Guyue Huang

Orcid: 0000-0002-1280-4781

According to our database1, Guyue Huang authored at least 20 papers between 2020 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Enabling Efficient Sparse Multiplications on GPUs With Heuristic Adaptability.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2025

Llama-Nemotron: Efficient Reasoning Models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, May, 2025

GMI-DRL: Empowering Multi-GPU DRL with Adaptive-Grained Parallelism.
Proceedings of the 2025 USENIX Annual Technical Conference, 2025

TRACI: Network Acceleration of Input-Dynamic Communication for Large-Scale Deep Learning Recommendation Model.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

2024
High-Performance Deep Learning Systems via DL Sparsity and DL Compiler
PhD thesis, 2024

OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

2022
Enabling Data Movement and Computation Pipelining in Deep Learning Compiler.
CoRR, 2022

Heuristic Adaptability to Input Dynamics for SpMM on GPUs.
CoRR, 2022

LightSeq2: Accelerated Training for Transformer-Based Models on GPUs.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Shfl-BW: accelerating deep neural network inference with tensor-core aware weight pruning.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Heuristic adaptability to input dynamics for SpMM on CPUs.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
Machine Learning for Electronic Design Automation: A Survey.
ACM Trans. Design Autom. Electr. Syst., 2021

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction.
CoRR, 2021

Exploiting Online Locality and Reduction Parallelism for Sampled Dense Matrix Multiplication on GPUs.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

2020
GE-SpMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networks.
Proceedings of the International Conference for High Performance Computing, 2020


  Loading...