Cong Guo

Orcid: 0000-0002-4479-5525

Affiliations:
  • Shanghai Jiao Tong University, Department of Computer Science and Engineering, China


According to our database1, Cong Guo authored at least 40 papers between 2019 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
A Full-Stack Framework for GNN Acceleration via Partition-Compiler-Architecture Co-Design.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., May, 2026

FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching.
CoRR, April, 2026

M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization.
CoRR, January, 2026

Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

M<sup>2</sup>XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization.
Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

Platinum: Path-Adaptable LUT-Based Accelerator Tailored for Low-Bit Weight Matrix Multiplication.
Proceedings of the 31st Asia and South Pacific Design Automation Conference, 2026

2025
Circuits to Systems: Codesigning Efficient AI Hardware.
IEEE Des. Test, December, 2025

CAMformer: Associative Memory is All You Need.
CoRR, November, 2025

DPad: Efficient Diffusion Language Models with Suffix Dropout.
CoRR, August, 2025

eLLM: Elastic Memory Management Framework for Efficient LLM Serving.
CoRR, June, 2025

DSTC: Dual-Side Sparse Tensor Core for DNNs Acceleration on Modern GPU Architectures.
IEEE Trans. Computers, February, 2025

Hamming Attention Distillation: Binarizing Keys and Queries for Efficient Long-Context Transformers.
CoRR, February, 2025

A Sample-Free Compilation Framework for Efficient Dynamic Tensor Computation.
Proceedings of the International Conference for High Performance Computing, 2025

Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-Aware Cache Compression.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Transitive Array: An Efficient GEMM Accelerator with Result Reuse.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Prosperity: Accelerating Spiking Neural Networks via Product Sparsity.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024
Accelerating Sparse DNNs Based on Tiled GEMM.
IEEE Trans. Computers, May, 2024

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models.
CoRR, 2024

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization.
CoRR, 2024

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving.
CoRR, 2024

JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Accelerating Generic Graph Neural Networks via Architecture, Compiler, Partition Method Co-Design.
CoRR, 2023

OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

2022
Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization.
CoRR, 2022

Towards Reliable AI Applications via Algorithm-Based Fault Tolerance on NVDLA.
Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

2021
Dual-side Sparse Tensor Core.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators.
Proceedings of the IEEE International Symposium on Workload Characterization, 2021

2020
Accelerating sparse DNN models without hardware-support via tile-wise sparsity.
Proceedings of the International Conference for High Performance Computing, 2020

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
Adversarial Defense Through Network Profiling Based Path Extraction.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019


  Loading...