Tushar Krishna

Sivasankaran Rajamanickam

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Pitstop: Enabling a Virtual Network Free Network-on-Chip.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU.

[BibT_eX]

[DOI]

Geonhwa Jeong

Eric Qin

Christopher J. Hughes

Sreenivas Subramoney

Hyesoon Kim

Sivasankaran Rajamanickam

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Bridging the Frequency Gap in Heterogeneous 3D SoCs through Technology-Specific NoC Router Architectures.

[BibT_eX]

[DOI]

Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package.

[BibT_eX]

[DOI]

Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators.

[BibT_eX]

[DOI]

Roberto Gioiosa

Venkata Chaitanya Krishna Chekuri

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020

Data Orchestration in Deep Learning Accelerators

[BibT_eX]

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01767-4, 2020

Architecture, Chip, and Package Codesign Flow for Interposer-Based 2.5-D Chiplet Integration Enabling Heterogeneous IP Reuse.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2020

ECOTLB: Eventually Consistent TLBs.

[BibT_eX]

[DOI]

Daniel Rodrigues Carvalho

ACM Trans. Archit. Code Optim., 2020

MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings.

[BibT_eX]

[DOI]

IEEE Micro, 2020

Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference.

[BibT_eX]

[DOI]

CoRR, 2020

The gem5 Simulator: Version 20.0+.

[BibT_eX]

[DOI]

Amin Farmahini Farahani

Hamidreza Khaleghzadeh

CoRR, 2020

Efficient Communication Acceleration for Next-Gen Scale-up Deep Learning Training Platforms.

[BibT_eX]

[DOI]

CoRR, 2020

STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators.

[BibT_eX]

[DOI]

Francisco Muñoz-Martínez

José L. Abellán

Manuel E. Acacio

CoRR, 2020

Conditional Neural Architecture Search.

[BibT_eX]

[DOI]

CoRR, 2020

Generative Design of Hardware-aware DNNs.

[BibT_eX]

[DOI]

Arun Ramamurthy

CoRR, 2020

MARVEL: A Decoupled Model-driven Approach for Efficiently Mapping Convolutions on Spatial DNN Accelerators.

[BibT_eX]

[DOI]

CoRR, 2020

Statistical Array Allocation and Partitioning for Compute In-Memory Fabrics.

[BibT_eX]

[DOI]

Proceedings of the VLSI-SoC: Design Trends, 2020

Breaking Barriers: Maximizing Array Utilization for Compute in-Memory Fabrics.

[BibT_eX]

[DOI]

Proceedings of the 28th IFIP/IEEE International Conference on Very Large Scale Integration, 2020

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning.

[BibT_eX]

[DOI]

Geonhwa Jeong

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices.

[BibT_eX]

[DOI]

Parth Mannan

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

DRAIN: Deadlock Removal for Arbitrary Irregular Networks.

[BibT_eX]

[DOI]

Hossein Farrokhbakht

Paul V. Gratz

Joshua San Miguel

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator.

[BibT_eX]

[DOI]

Sudhakar Yalamanchili

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2020

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Kite: A Family of Heterogeneous Interposer Topologies Enabled via Accurate Interconnect Modeling.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019

Synchronized Progress in Interconnection Networks (SPIN): A New Theory for Deadlock Freedom.

[BibT_eX]

[DOI]

Aniruddh Ramrakhyani

Paul V. Gratz

IEEE Micro, 2019

HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices.

[BibT_eX]

[DOI]

CoRR, 2019

BINDU: deadlock-freedom with one bubble in the network.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, 2019

Reinforcement learning based interconnection routing for adaptive traffic optimization.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, 2019

SWAP: Synchronized Weaving of Adjacent Packets for Network Deadlock Resolution.

[BibT_eX]

[DOI]

Paul V. Gratz

Joshua San Miguel

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

A communication-centric approach for designing flexible DNN accelerators.

[BibT_eX]

[DOI]

Venkata Chaitanya Krishna Chekuri

Proceedings of the 12th International Workshop on Network on Chip Architectures, 2019

mRNA: Enabling Efficient Mapping Space Exploration for a Reconfiguration Neural Accelerator.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Understanding the Impact of On-chip Communication on DNN Accelerator Performance.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems, 2019

Scaling the Cascades: Interconnect-Aware FPGA Implementation of Machine Learning Problems.

[BibT_eX]

[DOI]

Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

Architecture, Chip, and Package Co-design Flow for 2.5D IC Design Enabling Heterogeneous IP Reuse.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018

A Communication-Centric Approach for Designing Flexible DNN Accelerators.

[BibT_eX]

[DOI]

IEEE Micro, 2018

SCALE-Sim: Systolic CNN Accelerator.

[BibT_eX]

[DOI]

CoRR, 2018

MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators.

[BibT_eX]

[DOI]

Michael Pellauer

CoRR, 2018

Brownian Bubble Router: Enabling Deadlock Freedom via Guaranteed Forward Progress.

[BibT_eX]

[DOI]

Ankit Sinha

Proceedings of the Twelfth IEEE/ACM International Symposium on Networks-on-Chip, 2018

Architecting a Secure Wireless Network-on-Chip.

[BibT_eX]

[DOI]

Proceedings of the Twelfth IEEE/ACM International Symposium on Networks-on-Chip, 2018

GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Scalable Distributed Last-Level TLBs Using Low-Latency Interconnects.

[BibT_eX]

[DOI]

Srikant Bharadwaj

Guilherme Cox

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube.

[BibT_eX]

[DOI]

Ramyad Hadidi

Bahar Asgari

Jeffrey S. Young

Burhan Ahmad Mudassar

Kartikay Garg

Hyesoon Kim

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

SEESAW: Using Superpages to Improve VIPT Caches.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-Cost High-Performance Soft NoCs.

[BibT_eX]

[DOI]

Nachiket Kapre

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Merge Network for a Non-Von Neumann Accumulate Accelerator in a 3D Chip.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Rebooting Computing, 2018

Spoofing Prevention via RF Power Profiling in Wireless Network-on-Chip.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems, 2018

FastTrack: Exploiting Fast FPGA Wiring for Implementing NoC Shortcuts (Abstract Only).

[BibT_eX]

[DOI]

Nachiket Kapre

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

Optimizing the data placement and transformation for multi-bank CGRA computing system.

[BibT_eX]

[DOI]

Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

LATR: Lazy Translation Coherence.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017

On-Chip Networks, Second Edition

[BibT_eX]

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01755-1, 2017

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2017

FASHION: Fault-Aware Self-Healing Intelligent On-chip Network.

[BibT_eX]

[DOI]

CoRR, 2017

VESPA: VIPT Enhancements for Superpage Accesses.

[BibT_eX]

[DOI]

CoRR, 2017

Rethinking NoCs for Spatial Neural Network Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip, 2017

Adaptive Manycore Architectures for Big Data Computing.

[BibT_eX]

[DOI]

Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip, 2017

Lightweight Emulation of Virtual Channels using Swaps.

[BibT_eX]

[DOI]

Proceedings of the 10th International Workshop on Network on Chip Architectures, 2017

OpenSMART: Single-cycle multi-hop NoC generator in BSV and Chisel.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

A case for low frequency single cycle multi hop NoCs for energy efficiency and high performance.

[BibT_eX]

[DOI]

Monodeep Kar

Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

Static Bubble: A Framework for Deadlock-Free Irregular On-chip Topologies.

[BibT_eX]

[DOI]

Aniruddh Ramrakhyani

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Automatic place-and-route of emerging LED-driven wires within a monolithically-integrated CMOS-III-V process.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

2016

14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Solid-State Circuits Conference, 2016

2015

Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2015

2014

Enabling dedicated single-cycle connections over a shared network-on-chip.

[BibT_eX]

[DOI]

PhD thesis, 2014

Smart: Single-Cycle Multihop Traversals over a Shared Network on Chip.

[BibT_eX]

[DOI]

IEEE Micro, 2014

Single-cycle collective communication over a shared network fabric.

[BibT_eX]

[DOI]

Proceedings of the Eighth IEEE/ACM International Symposium on Networks-on-Chip, 2014

SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

SCORPIO: 36-core shared memory processor demonstrating snoopy coherence on a mesh interconnect.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS), 2014

Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs.

[BibT_eX]

[DOI]

Woo-Cheol Kwon

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013

SWIFT: A Low-Power Network-On-Chip Implementing the Token Flow Control Router Architecture With Swing-Reduced Interconnects.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2013

Single-Cycle Multihop Asynchronous Repeated Traversal: A SMART Future for Reconfigurable On-Chip Networks.

[BibT_eX]

[DOI]

Computer, 2013

Breaking the on-chip latency barrier using SMART.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

SMART: a single-cycle reconfigurable NoC for SoC applications.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2013

2012

Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI.

[BibT_eX]

[DOI]