Mikko H. Lipasti

Frontiers Comput. Neurosci., 2024

2023

TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency.

[BibT_eX]

[DOI]

Tushar Krishna

ACM Trans. Archit. Code Optim., September, 2023

Turn-based Spatiotemporal Coherence for GPUs.

[BibT_eX]

[DOI]

Sooraj Puthoor

ACM Trans. Archit. Code Optim., September, 2023

Energy-Efficient Bayesian Inference Using Bitstream Computing.

[BibT_eX]

[DOI]

Soroosh Khoram

Kyle Daruwalla

IEEE Comput. Archit. Lett., 2023

TailWAG: Tail Latency Workload Analysis and Generation.

[BibT_eX]

[DOI]

Heng Zhuo

Proceedings of the 5th International Workshop on Benchmarking in the Data Center, 2023

2022

PrGEMM: A Parallel Reduction SpGEMM Accelerator.

[BibT_eX]

[DOI]

Chien-Fu Chen

Proceedings of the GLSVLSI '22: Great Lakes Symposium on VLSI 2022, Irvine CA USA, June 6, 2022

Work-in-Progress: NoRF: A Case Against Register File Operands in Tightly-Coupled Accelerators.

[BibT_eX]

[DOI]

Heng Zhuo

Proceedings of the International Conference on Compilers, 2022

2021

Systems-on-Chip with Strong Ordering.

[BibT_eX]

[DOI]

Sooraj Puthoor

ACM Trans. Archit. Code Optim., 2021

Accelerating Deep Learning with Dynamic Data Pruning.

[BibT_eX]

[DOI]

Ravi S. Raju

Kyle Daruwalla

CoRR, 2021

MicroGrad: A Centralized Framework for Workload Cloning and Stress Testing.

[BibT_eX]

[DOI]

Ramon Bertran

Pradip Bose

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

2020

SHASTA: Synergic HW-SW Architecture for Spatio-temporal Approximation.

[BibT_eX]

[DOI]

Joshua San Miguel

ACM Trans. Archit. Code Optim., 2020

BitSAD v2: Compiler Optimization and Analysis for Bitstream Computing.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2020

Value Locality Based Approximation With ODIN.

[BibT_eX]

[DOI]

Rahul Singh

Joshua San Miguel

IEEE Comput. Archit. Lett., 2020

Modeling Architectural Support for Tightly-Coupled Accelerators.

[BibT_eX]

[DOI]

Heng Zhuo

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

BlurNet: Defense by Filtering the Feature Maps.

[BibT_eX]

[DOI]

Ravi S. Raju

Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2020

2019

BitBench: a benchmark for bitstream computing.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019

SECO: A Scalable Accuracy Approximate Exponential Function Via Cross-Layer Optimization.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019

Recycling Data Slack in Out-of-Order Cores.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

2018

Aggressive Slack Recycling via Transparent Pipelines.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design, 2018

Compiler assisted coalescing.

[BibT_eX]

[DOI]

Sooraj Puthoor

Vignyan Reddy Kothinti Naresh

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

The CURE: Cluster Communication Using Registers.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2017

Timing Speculation in Multi-Cycle Data Paths.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2017

Temporal codes in on-chip interconnects.

[BibT_eX]

[DOI]

Michael Mishkin

Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017

CHARSTAR: Clock Hierarchy Aware Resource Scaling in Tiled ARchitectures.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Architectural Support for Server-Side PHP Processing.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Evaluating hopfield-network-based linear solvers for hardware constrained neural substrates.

[BibT_eX]

[DOI]

Rohit Shukla

Erik Jorgensen

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

2016

BADGR: A practical GHR implementation for TAGE branch predictors.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Hash Map Inlining.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Adaptive Cache and Concurrency Allocation on GPGPUs.

[BibT_eX]

[DOI]

Zhong Zheng

Zhiying Wang

IEEE Comput. Archit. Lett., 2015

COP: to compress and protect main memory.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

A self-learning map-seeking circuit for visual object recognition.

[BibT_eX]

[DOI]

Rohit Shukla

Proceedings of the 2015 International Joint Conference on Neural Networks, 2015

iPatch: Intelligent fault patching to improve energy efficiency.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Tag tables.

[BibT_eX]

[DOI]

Sean Franey

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors.

[BibT_eX]

[DOI]

Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015

2014

Bias-Free Branch Predictor.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Tag check elision.

[BibT_eX]

[DOI]

Zhong Zheng

Zhiying Wang

Proceedings of the International Symposium on Low Power Electronics and Design, 2014

Precision-aware soft error protection for GPUs.

[BibT_eX]

[DOI]

Vignyan Reddy Kothinti Naresh

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Revolver: Processor architecture for power efficient loop execution.

[BibT_eX]

[DOI]

Mitchell Hayenga

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Atomic SC for simple in-order processors.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

2013

Resilient High-Performance Processors with Spare RIBs.

[BibT_eX]

[DOI]

IEEE Micro, 2013

Simulating cortical networks on heterogeneous multi-GPU systems.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

Accelerating atomic operations on GPGPUs.

[BibT_eX]

[DOI]

Sean Franey

Mushfique Junayed Khurshid

Proceedings of the 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS), 2013

Wavelength stealing: an opportunistic approach to channel sharing in multi-chip photonic interconnects.

[BibT_eX]

[DOI]

Ashok V. Krishnamoorthy

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

REEL: Reducing effective execution latency of floating point operations.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Data compression for thermal mitigation in the Hybrid Memory Cube.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Bridging the semantic gap: Emulating biological neuronal behaviors with simple digital neurons.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012

Edge chasing delayed consistency: pushing the limits of weak memory models.

[BibT_eX]

[DOI]

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability, 2012

BenchNN: On the broad potential application scope of hardware neural network accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Mitigating random variation with spare RIBs: Redundant intermediate bitslices.

[BibT_eX]

[DOI]

Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

2011

Systems for Very Large-Scale Computing.

[BibT_eX]

[DOI]

Vignyan Reddy Kothinti Naresh

IEEE Micro, 2011

CRAM: coded registers for amplified multiporting.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

The NoX router.

[BibT_eX]

[DOI]

Mitchell Hayenga

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Accelerating search and recognition workloads with SSE 4.2 string and text processing instructions.

[BibT_eX]

[DOI]

Guangyu Shi

Min Li

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Automatic abstraction and fault tolerance in cortical microachitectures.

[BibT_eX]

[DOI]

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

CRIB: consolidated rename, issue, and bypass.

[BibT_eX]

[DOI]

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms.

[BibT_eX]

[DOI]

Andrew Nere

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Atomic Coherence: Leveraging nanophotonics to build race-free cache coherence protocols.

[BibT_eX]

[DOI]

Dana Vantrease

Nathan L. Binkert

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Time redundant parity for low-cost transient error detection.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2011

A case for neuromorphic ISAs.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

2010

Combating Aging with the Colt Duty Cycle Equalizer.

[BibT_eX]

[DOI]

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

A Cortically Inspired Learning Model.

[BibT_eX]

[DOI]

Proceedings of the Computational Intelligence, 2010

Discovering Cortical Algorithms.

[BibT_eX]

Proceedings of the ICFC-ICNC 2010, 2010

Cortical architectures on a GPGPU.

[BibT_eX]

[DOI]

Andrew Nere

Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

2009

Light speed arbitration and flow control for nanophotonic interconnects.

[BibT_eX]

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

SCARAB: a single cycle adaptive routing and bufferless network.

[BibT_eX]

[DOI]

Mitchell Hayenga

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Achieving predictable performance through better memory controller placement in many-core CMPs.

[BibT_eX]

[DOI]

Dennis Abts

John Kim

Dan Gibson

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Cortical columns: Building blocks for intelligent systems.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing, 2009

2008

Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence.

[BibT_eX]

[DOI]

Li-Shiuan Peh

Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support.

[BibT_eX]

[DOI]

Li-Shiuan Peh

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Accelerating search and recognition with a TCAM functional unit.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Computer Design, 2008

Power-Efficient DRAM Speculation.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

An accurate flip-flop selection technique for reducing logic SER.

[BibT_eX]

[DOI]

Kewal K. Saluja

Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2008

Skewed redundancy.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

Narrow Width Dynamic Scheduling.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2007

Circuit-Switched Coherence.

[BibT_eX]

[DOI]

Li-Shiuan Peh

IEEE Comput. Archit. Lett., 2007

Speculative optimization using hardware-monitored guarded regions for java virtual machines.

[BibT_eX]

[DOI]

Lixin Su

Proceedings of the 3rd International Conference on Virtual Execution Environments, 2007

Power-aware operand delivery.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

An Evaluation of Server Consolidation Workloads for Multi-Core Designs.

[BibT_eX]

[DOI]

Dana Vantrease

Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Transparent mode flip-flops for collapsible pipelines.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Computer Design, 2007

A position-insensitive finished store buffer.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Computer Design, 2007

2006

Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays.

[BibT_eX]

[DOI]

IEEE Micro, 2006

Energy Estimation of the Memory Subsystem in Multiprocessor Systems.

[BibT_eX]

[DOI]

Eric F. Weglarz

Kewal K. Saluja

J. Low Power Electron., 2006

Friendly fire: understanding the effects of multiprocessor prefetches.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Stall cycle redistribution in a transparent fetch pipeline.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

An approach for implementing efficient superscalar CISC processors.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

Dynamic Class Hierarchy Mutation.

[BibT_eX]

[DOI]

Lixin Su

Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

Stealth prefetching.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2005

The Complexity of Verifying Memory Coherence and Consistency.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2005

Reaping the Benefit of Temporal Silence to Improve Communication Performance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking.

[BibT_eX]

[DOI]

Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

2004

Memory Ordering: A Value-Based Approach.

[BibT_eX]

[DOI]

IEEE Micro, 2004

Constraint Graph Analysis of Multithreaded Programs.

[BibT_eX]

[DOI]

Ravi Nair

J. Instr. Level Parallelism, 2004

Deconstructing commit.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Physical Register Inlining.

[BibT_eX]

[DOI]

Brian R. Mestan

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Understanding Scheduling Replay Schemes.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

2003

The complexity of verifying memory coherence.

[BibT_eX]

[DOI]

Proceedings of the SPAA 2003: Proceedings of the Fifteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2003

Macro-op Scheduling: Relaxing Scheduling Loop Constraints.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Half-Price Architecture.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Exploiting Partial Operand Knowledge.

[BibT_eX]

[DOI]

Brian R. Mestan

Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

Redeeming IPC as a Performance Metric for Multithreaded Programs.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

2002

Minimizing Energy Consumption for High-Performance Processing.

[BibT_eX]

[DOI]

Eric F. Weglarz

Kewal K. Saluja

Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

Verifying sequential consistency using vector clocks.

[BibT_eX]

[DOI]

Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, 2002

Avoiding Initialization Misses to the Heap.

[BibT_eX]

[DOI]

Jarrod A. Lewis

Bryan Black

Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Implementing Optimizations at Decode Time.

[BibT_eX]

[DOI]

Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Temporally silent stores.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001

Silent Stores and Store Value Locality.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2001

A dynamic binary translation approach to architectural simulation.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2001

Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing.

[BibT_eX]

[DOI]

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

An Architectural Evaluation of Java TPC-W.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

2000

A performance methodology for commercial servers.

[BibT_eX]

[DOI]

Steven R. Kunkel

Richard J. Eickemeyer

IBM J. Res. Dev., 2000

Silent stores for free.

[BibT_eX]

[DOI]

Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

On the value locality of store instructions.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Characterization of Silent Stores.

[BibT_eX]

[DOI]

Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999

The Effect of Program Optimization on Trace Cache Efficiency.

[BibT_eX]

[DOI]

Derek L. Howard

Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998

Exploiting Value Locality to Exceed the Dataflow Limit.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1998

1997

Superspeculative Microarchitecture for Beyond AD 2000.

[BibT_eX]

[DOI]

Computer, 1997

The Performance Potential of Value and Dependence Prediction.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '97 Parallel Processing, 1997

1996

Exceeding the Dataflow Limit via Value Prediction.

[BibT_eX]

[DOI]

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

Can Trace-Driven Simulators Accurately Predict Superscalar Performance?

[BibT_eX]

[DOI]

Proceedings of the 1996 International Conference on Computer Design (ICCD '96), 1996

Value Locality and Load Value Prediction.

[BibT_eX]

[DOI]

Christopher B. Wilkerson

Proceedings of the ASPLOS-VII Proceedings, 1996

1995

SPAID: software prefetching in pointer- and call-intensive environments.

[BibT_eX]

[DOI]

Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

1993

Architecture-Compatible Code Boosting for Performance Enhancement of the IBM RS/6000.

[BibT_eX]

[DOI]

Trung A. Diep