Cong Guo

Orcid: 0000-0002-4479-5525

Affiliations:

Shanghai Jiao Tong University, Department of Computer Science and Engineering, China

According to our database¹, Cong Guo authored at least 44 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

A Full-Stack Framework for GNN Acceleration via Partition-Compiler-Architecture Co-Design.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., May, 2026

Optimus: Elastic Decoding for Efficient Diffusion LLM Serving.

[BibT_eX]

[DOI]

CoRR, May, 2026

EVA: Accelerating LLM Decoding via an Efficient Vector Quantization Architecture.

[BibT_eX]

[DOI]

CoRR, May, 2026

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems.

[BibT_eX]

[DOI]

CoRR, April, 2026

FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching.

[BibT_eX]

[DOI]

CoRR, April, 2026

M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization.

[BibT_eX]

[DOI]

CoRR, January, 2026

Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

Frame Skipping Architecture for Video-Language Model Acceleration.

[BibT_eX]

[DOI]

Proceedings of the Great Lakes Symposium on VLSI 2026, 2026

M<sup>2</sup>XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

Platinum: Path-Adaptable LUT-Based Accelerator Tailored for Low-Bit Weight Matrix Multiplication.

[BibT_eX]

[DOI]

Proceedings of the 31st Asia and South Pacific Design Automation Conference, 2026

2025

Circuits to Systems: Codesigning Efficient AI Hardware.

[BibT_eX]

[DOI]

IEEE Des. Test, December, 2025

CAMformer: Associative Memory is All You Need.

[BibT_eX]

[DOI]

Tergel Molom-Ochir

Benjamin F. Morris III

CoRR, November, 2025

DPad: Efficient Diffusion Language Models with Suffix Dropout.

[BibT_eX]

[DOI]

CoRR, August, 2025

eLLM: Elastic Memory Management Framework for Efficient LLM Serving.

[BibT_eX]

[DOI]

CoRR, June, 2025

DSTC: Dual-Side Sparse Tensor Core for DNNs Acceleration on Modern GPU Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Computers, February, 2025

Hamming Attention Distillation: Binarizing Keys and Queries for Efficient Long-Context Transformers.

[BibT_eX]

[DOI]

CoRR, February, 2025

A Sample-Free Compilation Framework for Efficient Dynamic Tensor Computation.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2025

Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-Aware Cache Compression.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Transitive Array: An Efficient GEMM Accelerator with Result Reuse.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Prosperity: Accelerating Spiking Neural Networks via Product Sparsity.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024

Accelerating Sparse DNNs Based on Tiled GEMM.

[BibT_eX]

[DOI]

IEEE Trans. Computers, May, 2024

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization.

[BibT_eX]

[DOI]

CoRR, 2024

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving.

[BibT_eX]

[DOI]

CoRR, 2024

JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Accelerating Generic Graph Neural Networks via Architecture, Compiler, Partition Method Co-Design.

[BibT_eX]

[DOI]

CoRR, 2023

OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

2022

Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization.

[BibT_eX]

[DOI]

CoRR, 2022

Towards Reliable AI Applications via Algorithm-Based Fault Tolerance on NVDLA.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE 40th International Conference on Computer Design, 2022

2021

Dual-side Sparse Tensor Core.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2021

2020

Accelerating sparse DNN models without hardware-support via tile-wise sparsity.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019

Adversarial Defense Through Network Profiling Based Path Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Cong Guo

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...