Venkatesh Akella

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023

Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems.

[BibT_eX]

[DOI]

Mark Hildebrand

Jason Lowe-Power

Proceedings of the High Performance Computing - 38th International Conference, 2023

Scalable Hardware Acceleration of Graph Processing with Photonic Interconnects.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Photonics in Switching and Computing, 2023

2022

A Model for Scalable and Balanced Accelerators for Graph Processing.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2022

LLM: Realizing Low-Latency Memory by Exploiting Embedded Silicon Photonics for Irregular Workloads.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 37th International Conference, 2022

SoK: Limitations of Confidential Computing via TEEs for High-Performance Compute Systems.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022

2021

HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 36th International Conference, 2021

A Case Against Hardware Managed DRAM Caches for NVRAM Based Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Performance Analysis of Scientific Computing Workloads on General Purpose TEEs.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020

FPGA and GPU-based acceleration of ML workloads on Amazon cloud - A case study using gradient boosted decision tree library.

[BibT_eX]

[DOI]

Maxim Shepovalov

Integr., 2020

Performance Analysis of Scientific Computing Workloads on Trusted Execution Environments.

[BibT_eX]

[DOI]

CoRR, 2020

Predicting soil permanganate oxidizable carbon (POXC) by coupling DRIFT spectroscopy and artificial neural networks (ANN).

[BibT_eX]

[DOI]

Comput. Electron. Agric., 2020

HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated Systems.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

Multiplier-Free Implementation of Galois Field Fourier Transform on a FPGA.

[BibT_eX]

[DOI]

Sree Balaji Girisankar

IEEE Trans. Circuits Syst. II Express Briefs, 2019

2018

A case for exposing extra-architectural state in the ISA: position paper.

[BibT_eX]

[DOI]

Proceedings of the 7th International Workshop on Hardware and Architectural Support for Security and Privacy, 2018

Improving Provisioned Power Efficiency in HPC Systems with GPU-CAPP.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

Scalable Hardware Accelerator for Mini-Batch Gradient Descent.

[BibT_eX]

[DOI]

Sandeep Rasoori

Proceedings of the 2018 on Great Lakes Symposium on VLSI, 2018

2017

Improving Execution Time of Parallel Programs on Large Scale Chip Multiprocessors with Constant Average Power Processing.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Design and Evaluation of AWGR-Based Photonic NoC Architectures for 2.5D Integrated High Performance Computing Systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016

Photonic Interconnects for Interposer-based 2.5D/3D Integrated Systems on a Chip.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on Memory Systems, 2016

HogWild++: A New Mechanism for Decentralized Asynchronous Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Huan Zhang

Cho-Jui Hsieh

Proceedings of the IEEE 16th International Conference on Data Mining, 2016

Design space exploration of FPGA-based Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2014

Simultaneously Reducing Latency and Power Consumption in OpenFlow Switches.

[BibT_eX]

[DOI]

IEEE/ACM Trans. Netw., 2014

PDG_GEN: A Methodology for Fast and Accurate Simulation of On-Chip Networks.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2014

Runtime Adaptation of Applications Using Design Of Experiments: A Smartphone-Based Case Study.

[BibT_eX]

[DOI]

Frank Maker III

IEEE Embed. Syst. Lett., 2014

2013

MELOADES: Methodology for long-term online adaptation of embedded software for heterogeneous devices.

[BibT_eX]

[DOI]

Frank Maker III

J. Syst. Archit., 2013

Scalability and performance of a distributed AWGR-based all-optical token interconnect architecture.

[BibT_eX]

[DOI]

Proceedings of the 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), 2013

Update rate tradeoffs for improving online power modeling in smartphones.

[BibT_eX]

[DOI]

Frank Maker III

Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

2012

Efficient Configurable Decoder Architecture for Nonbinary Quasi-Cyclic LDPC Codes.

[BibT_eX]

[DOI]

Xiaoheng Chen

Shu Lin

IEEE Trans. Circuits Syst. I Regul. Pap., 2012

AWGR-Based Optical Topologies for Scalable and Efficient Global Communications in Large-Scale Multi-Processor Systems.

[BibT_eX]

[DOI]

Xiaohui Ye

S. J. Ben Yoo

JOCN, 2012

DCOF - An Arbitration Free Directly Connected Optical Fabric.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

DCAF - A Directly Connected Arbitration-Free Photonic Crossbar for Energy-Efficient High Performance Computing.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011

Exploiting data-level parallelism for energy-efficient implementation of LDPC decoders and DCT on an FPGA.

[BibT_eX]

[DOI]

Xiaoheng Chen

ACM Trans. Reconfigurable Technol. Syst., 2011

Hardware Implementation of a Backtracking-Based Reconfigurable Decoder for Lowering the Error Floor of Quasi-Cyclic LDPC Codes.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2011

Memory System Optimization for FPGA-Based Implementation of Quasi-Cyclic LDPC Codes Decoders.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2011

Buffering and Flow Control in Optical Switches for High Performance Computing.

[BibT_eX]

[DOI]

JOCN, 2011

Inferring packet dependencies to improve trace based simulation of on-chip networks.

[BibT_eX]

[DOI]

Proceedings of the NOCS 2011, 2011

Resilient microring resonator based photonic networks.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Addressing system-level trimming issues in on-chip nanophotonic networks.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010

QSN - A Simple Circular-Shift Network for Reconfigurable Quasi-Cyclic LDPC Decoders.

[BibT_eX]

[DOI]

Xiaoheng Chen

Shu Lin

IEEE Trans. Circuits Syst. II Express Briefs, 2010

Optical Router Control Architecture and Contention Resolution Algorithms Capable of Asynchronous, Variable-Length Packet Switching.

[BibT_eX]

[DOI]

JOCN, 2010

Markov decision process (MDP) framework for software power optimization using call profiles on mobile phones.

[BibT_eX]

[DOI]

Des. Autom. Embed. Syst., 2010

Performance Evaluation of a Multicore System with Optically Connected Memory Modules.

[BibT_eX]

[DOI]

Paul Vincent Mejia

Proceedings of the NOCS 2010, 2010

DOS: a scalable optical switch for datacenters.

[BibT_eX]

[DOI]

Proceedings of the 2010 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2010

2009

Markov decision process (MDP) framework for optimizing software on mobile phones.

[BibT_eX]

[DOI]

Proceedings of the 9th ACM & IEEE International conference on Embedded software, 2009

Accelerating FPGA-based emulation of quasi-cyclic LDPC codes with vector processing.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2009

FPGA-based low-complexity high-throughput tri-mode decoder for quasi-cyclic LDPC codes.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual Allerton Conference on Communication, 2009

2008

Design and evaluation of an optical CPU-DRAM interconnect.

[BibT_eX]

[DOI]

Amit Hadke

Tony Benavides

Proceedings of the 26th International Conference on Computer Design, 2008

OCDIMM: Scaling the DRAM Memory Wall Using WDM Based Optical Interconnects.

[BibT_eX]

[DOI]

Amit Hadke

Tony Benavides

S. J. Ben Yoo

Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

Credit-based dynamic reliability management using online wearout detection.

[BibT_eX]

[DOI]

Frederic T. Chong

Proceedings of the 5th Conference on Computing Frontiers, 2008

2007

Using Application Bisection Bandwidth to Guide Tile Size Selection for the Synchroscalar Tile-Based Architecture.

[BibT_eX]

[DOI]

Trans. High Perform. Embed. Archit. Compil., 2007

Life Cycle Aware Computing: Reusing Silicon Technology.

[BibT_eX]

[DOI]

Roland Geyer

Frederic T. Chong

Computer, 2007

2006

Synchroscalar: Evaluation of an embedded, multi-core architecture for media applications.

[BibT_eX]

[DOI]

J. Embed. Comput., 2006

Segmented Bitline Cache: Exploiting Non-uniform Memory Access Patterns.

[BibT_eX]

[DOI]

Ravishankar Rao

Justin Wenck

Diana Franklin

Proceedings of the High Performance Computing, 2006

Tile size selection for low-power tile-based architectures.

[BibT_eX]

[DOI]

Proceedings of the Third Conference on Computing Frontiers, 2006

2005

Proactive Energy Optimization Algorithms for Wavelet-Based Video Codecs on Power-Aware Processors.

[BibT_eX]

[DOI]

Wen-Fu Kao

Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Scheduling optical packets in wavelength, time, and space domains for all-optical packet switching routers.

[BibT_eX]

[DOI]

Proceedings of IEEE International Conference on Communications, 2005

Complexity metric driven energy optimization framework for implementing MPEG-21 scalable video decoders.

[BibT_eX]

[DOI]

Gouri Landge

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Generic modeling of complexity for motion-compensated wavelet video decoders.

[BibT_eX]

[DOI]

Gouri Landge

Proceedings of the Electronic Imaging: Image and Video Communications and Processing 2005, 2005

2004

Efficient orchestration of sub-word parallelism in media processors.

[BibT_eX]

[DOI]

Frederic T. Chong

Proceedings of the SPAA 2004: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2004

Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor.

[BibT_eX]

[DOI]

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Rate-distortion-complexity adaptive video compression and streaming.

[BibT_eX]

Deepak S. Turaga

Proceedings of the 2004 International Conference on Image Processing, 2004

2003

High-performance optical-label switching packet routers and smart edge routers for the next-generation Internet.

[BibT_eX]

[DOI]

IEEE J. Sel. Areas Commun., 2003

Synchroscalar: Initial Lessons in Power-Aware Design of a Tile-Based Embedded Architecture.

[BibT_eX]

[DOI]

Proceedings of the Power-Aware Computer Systems, Third International Workshop, 2003

Improving DSP Performance with a Small Amount of Field Programmable Logic.

[BibT_eX]

[DOI]

Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003

2001

An Asynchronous Superscalar Architecture for Exploiting Instruction-Level Parallelism.

[BibT_eX]

[DOI]

Tony Werner

Proceedings of the 7th International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC 2001), 2001

1999

Automatic Insertion of Gated Clocks at Register Transfer Level.

[BibT_eX]

[DOI]

Nithya Raghavan

Smita Bakshi

Proceedings of the 12th International Conference on VLSI Design (VLSI Design 1999), 1999

1998

Micropipelined asynchronous discrete cosine transform (DCT/IDCT) processor.

[BibT_eX]

[DOI]

Dave Johnson

Bret Stott

IEEE Trans. Very Large Scale Integr. Syst., 1998

Asynchronous Comparison-Based Decoders for Delay-Insensitive Codes.

[BibT_eX]

[DOI]

Nitin H. Vaidya

G. Robert Redinbo

IEEE Trans. Computers, 1998

1997

Asynchronous Processor Survey.

[BibT_eX]

[DOI]

Tony Werner

Computer, 1997

1996

Limitations of VLSI Implementation of Delay-Insensitive Codes.

[BibT_eX]

[DOI]

Nitin H. Vaidya

G. Robert Redinbo

Proceedings of the Digest of Papers: FTCS-26, 1996

Counterflow pipeline based dynamic instruction scheduling.

[BibT_eX]

[DOI]

Tony Werner

Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC '96), 1996

1995

Asynchronous 2-D discrete cosine transform core processor.

[BibT_eX]

[DOI]

Bret Stott

Dave Johnson

Proceedings of the 1995 International Conference on Computer Design (ICCD '95), 1995

1994

High-level optimizations in compiling process descriptions to asynchronous circuits.

[BibT_eX]

[DOI]

J. VLSI Signal Process., 1994

Specification and Validation of Control-Intensive IC's in hopCP.

[BibT_eX]

[DOI]

IEEE Trans. Software Eng., 1994

CFSIM: A Concurrent Compiled Code Functional Simulator for hopCP.

[BibT_eX]

[DOI]

Int. J. Comput. Simul., 1994

Testing two-phase transition signaling based self-timed circuits in a synthesis environment.

[BibT_eX]

[DOI]

Prabhakar Kudva

Proceedings of the 7th International Symposium on High Level Synthesis, 1994

Performance Analysis and Optimization of Asynchronous Circuits.

[BibT_eX]

[DOI]

Prabhakar Kudva

Erik Brunvand

Proceedings of the Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1994

A technique for estimating power in asynchronous circuits.

[BibT_eX]

[DOI]

Prabhakar Kudva

Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems, 1994

1993

A transformational approach to asynchronous high-level synthesis.

[BibT_eX]

Proceedings of the VLSI 93, 1993

1992

VLSI asynchronous systems: specification and synthesis.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 1992

From Process-Oriented Functional Specifications to Efficient Asynchronous Circuits.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Conference on VLSI Design, 1992

SHILPA: a high-level synthesis system for self-timed circuits.

[BibT_eX]

[DOI]

Proceedings of the 1992 IEEE/ACM International Conference on Computer-Aided Design, 1992

1989

HOP: A process model for synchronous hardware; semantics and experiments in process composition.

[BibT_eX]

[DOI]

Richard M. Fujimoto

Narayana Mani

Integr., 1989

Parallel Composition of Lockstep Synchronous Processes for Hardware Validation: Divide-and-Conquer Composition.

[BibT_eX]

[DOI]

Narayana Mani