Guohao Dai

Orcid: 0000-0003-0849-3252

According to our database1, Guohao Dai authored at least 74 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
GRAPHIC: Gather and Process Harmoniously in the Cache With High Parallelism and Flexibility.
IEEE Trans. Emerg. Top. Comput., 2024

FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models.
CoRR, 2024

Evaluating Quantized Large Language Models.
CoRR, 2024

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K.
CoRR, 2024

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

2023
CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023

MNSIM 2.0: A Behavior-Level Modeling Tool for Processing-In-Memory Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023

Gibbon: An Efficient Co-Exploration Framework of NN Model and Processing-In-Memory Architecture.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023

Adaptive Multidimensional Parallel Fault Simulation Framework on Heterogeneous System.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., June, 2023

Sgap: towards efficient sparse tensor algebra compilation for GPU.
CCF Trans. High Perform. Comput., June, 2023

Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective.
IEEE Trans. Computers, May, 2023

Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization.
CoRR, 2023

FlashDecoding++: Faster Large Language Model Inference on GPUs.
CoRR, 2023

CogDL: A Comprehensive Library for Graph Deep Learning.
Proceedings of the ACM Web Conference 2023, 2023

History-Detr: Optimize Query Initialization Strategy by Using Historical Information and Kinematics.
Proceedings of the ACM Multimedia Asia 2023, 2023

DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TSTC: Two-Level Sparsity Tensor Core Enabling both Algorithm Flexibility and Hardware Efficiency.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

OPT: Optimal Proposal Transfer for Efficient Yield Optimization for Analog and SRAM Circuits.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

A Point Transformer Accelerator with Fine-Grained Pipelines and Distribution-Aware Dynamic FPS.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

Minimizing Communication Conflicts in Network-On-Chip Based Processing-In-Memory Architecture.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

PIM-HLS: An Automatic Hardware Generation Tool for Heterogeneous Processing-In-Memory-based Neural Network Accelerators.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Processing-In-Hierarchical-Memory Architecture for Billion-Scale Approximate Nearest Neighbor Search.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

An Efficient Accelerator for Point-based and Voxel-based Point Cloud Neural Networks.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Seeking the Yield Barrier: High-Dimensional SRAM Evaluation Through Optimal Manifold.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Memory-Efficient and Real-Time SPAD-based dToF Depth Sensor with Spatial and Statistical Correlation.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

TorchSparse++: Efficient Point Cloud Engine.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

High-Dimensional Yield Estimation Using Shrinkage Deep Features and Maximization of Integral Entropy Reduction.
Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023

NTGAT: A Graph Attention Network Accelerator with Runtime Node Tailoring.
Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023

2022
A Unified FPGA Virtualization Framework for General-Purpose Deep Neural Networks in the Cloud.
ACM Trans. Reconfigurable Technol. Syst., 2022

Exploring the Potential of Low-Bit Training of Convolutional Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

INCAME: Interruptible CNN Accelerator for Multirobot Exploration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

GRAPHIC: GatheR-And-Process in Highly parallel with In-SSD Compression Architecture in Very Large-Scale Graph.
CoRR, 2022

Heuristic Adaptability to Input Dynamics for SpMM on GPUs.
CoRR, 2022

Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective.
Proceedings of Machine Learning and Systems 2022, 2022

Optimizing Graph-based Approximate Nearest Neighbor Search: Stronger and Smarter.
Proceedings of the 23rd IEEE International Conference on Mobile Data Management, 2022

DIMMining: pruning-efficient and parallel graph mining on near-memory-computing.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Exploiting Parallelism with Vertex-Clustering in Processing-In-Memory-based GCN Accelerators.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Heuristic adaptability to input dynamics for SpMM on CPUs.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

A one-for-all and <i>o</i>(<i>v</i> log(<i>v</i> ))-cost solution for parallel merge style operations on sorted key-value arrays.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction.
CoRR, 2021

CogDL: An Extensive Toolkit for Deep Learning on Graphs.
CoRR, 2021

Exploiting Online Locality and Reduction Parallelism for Sampled Dense Matrix Multiplication on GPUs.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

Rerec: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

3M-AI: A Multi-task and Multi-core Virtualization Framework for Multi-FPGA AI Systems in the Cloud.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

2020
GE-SpMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networks.
Proceedings of the International Conference for High Performance Computing, 2020

GraphSDH: A General Graph Sampling Framework with Distribution and Hierarchy.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

LessMine: Reducing Sample Space and Data Access for Dense Pattern Mining.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

MNSIM 2.0: A Behavior-Level Modeling Tool for Memristor-based Neuromorphic Computing Systems.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

An Order Sampling Processing-in-Memory Architecture for Approximate Graph Pattern Mining.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

Enable Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

INCAME: INterruptible CNN Accelerator for Multi-robot Exploration.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

INCA: INterruptible CNN Accelerator for Multi-tasking in Embedded Robots.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

HyVE: Hybrid Vertex-Edge Memory Hierarchy for Energy-Efficient Graph Processing.
IEEE Trans. Computers, 2019

Centrifuge: Evaluating full-system HLS-generated heterogenous-accelerator SoCs using FPGA-Acceleration.
Proceedings of the International Conference on Computer-Aided Design, 2019

A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Memory-Bound Proof-of-Work Acceleration for Blockchain Applications.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

GraphSAR: a sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019

2018
GraphIA: an in-situ accelerator for large-scale graph processing.
Proceedings of the International Symposium on Memory Systems, 2018

NewGraph: Balanced Large-Scale Graph Processing on FPGAs with Low Preprocessing Overheads.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

HyVE: Hybrid vertex-edge memory hierarchy for energy-efficient graph processing.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

2017
ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

2016
NXgraph: An efficient graph processing system on a single machine.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

Approximate Frequent Itemset Mining for streaming data on FPGA.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

2015
A self-aware data compression system on FPGA in Hadoop.
Proceedings of the 2015 International Conference on Field Programmable Technology, 2015

2014
Online scheduling for FPGA computation in the Cloud.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

2013
DTW-Based Subsequence Similarity Search on AMD Heterogeneous Computing Platform.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013


  Loading...