Ang Li

Proceedings of the 12th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2022

A Framework for Neural Network Inference on FPGA-Centric SmartNICs.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

FCsN: A FPGA-Centric SmartNIC Framework for Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

A length adaptive algorithm-hardware co-design of transformer on FPGA through sparse attention and dynamic pipelining.

[BibT_eX]

[DOI]

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Efficient Hierarchical State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning.

[BibT_eX]

[DOI]

Sriram Krishnamoorthy

Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021

BCNN: Binary complex neural network.

[BibT_eX]

[DOI]

Microprocess. Microsystems, November, 2021

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs.

[BibT_eX]

[DOI]

Simon Su

IEEE Trans. Parallel Distributed Syst., 2021

O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search.

[BibT_eX]

[DOI]

CoRR, 2021

Binary Complex Neural Network Acceleration on FPGA.

[BibT_eX]

[DOI]

CoRR, 2021

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression.

[BibT_eX]

[DOI]

CoRR, 2021

Learning and Fast Adaptation for Grid Emergency Control via Deep Meta Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

SV-sim: scalable PGAS-based state vector simulation of quantum circuits.

[BibT_eX]

[DOI]

Bo Fang

Christopher E. Granade

Guen Prawiroatmodjo

Bettina Heim

Martin Roetteler

Sriram Krishnamoorthy

Proceedings of the International Conference for High Performance Computing, 2021

APNN-TC: accelerating arbitrary precision neural networks on ampere GPU tensor cores.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

QuGAN: A Quantum State Fidelity based Generative Adversarial Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2021

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021

A Hybrid System for Learning Classical Data in Quantum States.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Performance, 2021

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications.

[BibT_eX]

[DOI]

Cheng Tan

Tong Geng

Chenhao Xie

Nicolas Bohm Agostini

Proceedings of the 39th IEEE International Conference on Computer Design, 2021

G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search (Special Session Paper).

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

FL-DISCO: Federated Generative Adversarial Network for Graph-based Molecule Drug Discovery: Special Session Paper.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

TQEA: Temporal Quantum Error Analysis.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021

AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Guarding Numerics Amidst Rising Heterogeneity.

[BibT_eX]

[DOI]

Ganesh Gopalakrishnan

Proceedings of the 5th IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2021

Binary Complex Neural Network Acceleration on FPGA : (Invited Paper).

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays.

[BibT_eX]

[DOI]

Cheng Tan

Nicolas Bohm Agostini

Jeff Zhang

Marco Minutoli

Vito Giovanni Castellana

Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

2020

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2020

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.

[BibT_eX]

[DOI]

CoRR, 2020

Accelerated Deep Reinforcement Learning Based Load Shedding for Emergency Voltage Control.

[BibT_eX]

[DOI]

CoRR, 2020

Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters.

[BibT_eX]

[DOI]

Omer Subasi

Xiu Yang

Sriram Krishnamoorthy

Proceedings of the International Conference for High Performance Computing, 2020

A parallel sparse tensor benchmark suite on CPUs and GPUs.

[BibT_eX]

[DOI]

Jiajia Li

Mahesh Lakshminarasimhan

Xiaolong Wu

Catherine Olschanowsky

Kevin J. Barker

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

A Sparse Tensor Benchmark Suite for CPUs and GPUs.

[BibT_eX]

[DOI]

Jiajia Li

Mahesh Lakshminarasimhan

Xiaolong Wu

Catherine Olschanowsky

Kevin J. Barker

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks.

[BibT_eX]

[DOI]

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Detecting Anomalous Computation with RNNs on GPU-Accelerated HPC Machines.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

OpenCGRA: An Open-Source Unified Framework for Modeling, Testing, and Evaluating CGRAs.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Computer Design, 2020

CQNN: a CGRA-based QNN Framework.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Indicator-Directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

2019

UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing.

[BibT_eX]

[DOI]

CoRR, 2019

A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing.

[BibT_eX]

[DOI]

CoRR, 2019

PASTA: a parallel sparse tensor algorithm benchmark suite.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., 2019

BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

Fingerprinting Anomalous Computation with RNN for GPU-accelerated HPC Machines.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2019

O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing, 2019

PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World with Customized Memory Cube.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

2018

Superneurons: dynamic GPU memory management for training deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Introduction to HPPAC 2018.

[BibT_eX]

[DOI]

Shuaiwen Leon Song

Natalie J. Bates

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Warp-Consolidation: A Novel Execution Model for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

CUDAAdvisor: LLVM-based runtime profiling for modern GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

2017

Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels.

[BibT_eX]

[DOI]

Mads Ruben Burgdorff Kristensen

Weifeng Liu

Proceedings of the International Conference for High Performance Computing, 2017

BVF: enabling significant on-chip power savings via bit-value-favor for throughput processors.

[BibT_eX]

[DOI]

Daniel G. Chavarría-Miranda

Wenfeng Zhao

Shuaiwen Leon Song

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Locality-Aware CTA Clustering for Modern GPUs.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Analysis and design of energy-efficient data-dependent SRAM.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on ASIC, 2017

2016

X: A Comprehensive Analytic Model for Parallel Machines.

[BibT_eX]

[DOI]

Daniel G. Chavarría-Miranda

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

SFU-Driven Transparent Approximation Acceleration on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Supercomputing, 2016

A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing, 2016

Critical points based register-concurrency autotuning for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

2015

Correlation ratio based volume image registration on GPUs.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2015

Adaptive and transparent cache bypassing for GPUs.

[BibT_eX]

[DOI]

Gert-Jan van den Braak

Proceedings of the International Conference for High Performance Computing, 2015

Fine-Grained Synchronizations and Dataflow Programming on GPUs.

[BibT_eX]

[DOI]

Gert-Jan van den Braak

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Transit: A Visual Analytical Model for Multithreaded Machines.

[BibT_eX]

[DOI]

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

A Locality Aware Convolutional Neural Networks Accelerator.

[BibT_eX]

[DOI]

Proceedings of the 2015 Euromicro Conference on Digital System Design, 2015

Accelerating non-volatile/hybrid processor cache design space exploration for application specific embedded systems.

[BibT_eX]

[DOI]

Mohammad Shihabul Haque

Qingsong Wei

Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

2014

A heterogeneous platform with GPU and FPGA for power efficient high performance computing.

[BibT_eX]

[DOI]

Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

Accelerating Volume Image Registration through Correlation Ratio Based Methods on GPUs.

[BibT_eX]

[DOI]