Mikko H. Lipasti

Orcid: 0000-0002-8535-9244

Affiliations:
  • University of Wisconsin-Madison, USA


According to our database1, Mikko H. Lipasti authored at least 122 papers between 1993 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Awards

IEEE Fellow

IEEE Fellow 2013, "For contributions to the microarchitecture and design of high-performance microprocessors and computer systems".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency.
ACM Trans. Archit. Code Optim., September, 2023

Turn-based Spatiotemporal Coherence for GPUs.
ACM Trans. Archit. Code Optim., September, 2023

Energy-Efficient Bayesian Inference Using Bitstream Computing.
IEEE Comput. Archit. Lett., 2023

TailWAG: Tail Latency Workload Analysis and Generation.
Proceedings of the 5th International Workshop on Benchmarking in the Data Center, 2023

2022
PrGEMM: A Parallel Reduction SpGEMM Accelerator.
Proceedings of the GLSVLSI '22: Great Lakes Symposium on VLSI 2022, Irvine CA USA, June 6, 2022

Work-in-Progress: NoRF: A Case Against Register File Operands in Tightly-Coupled Accelerators.
Proceedings of the International Conference on Compilers, 2022

2021
Systems-on-Chip with Strong Ordering.
ACM Trans. Archit. Code Optim., 2021

Information Bottleneck-Based Hebbian Learning Rule Naturally Ties Working Memory and Synaptic Updates.
CoRR, 2021

Accelerating Deep Learning with Dynamic Data Pruning.
CoRR, 2021

MicroGrad: A Centralized Framework for Workload Cloning and Stress Testing.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

2020
SHASTA: Synergic HW-SW Architecture for Spatio-temporal Approximation.
ACM Trans. Archit. Code Optim., 2020

BitSAD v2: Compiler Optimization and Analysis for Bitstream Computing.
ACM Trans. Archit. Code Optim., 2020

Value Locality Based Approximation With ODIN.
IEEE Comput. Archit. Lett., 2020

Modeling Architectural Support for Tightly-Coupled Accelerators.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

BlurNet: Defense by Filtering the Feature Maps.
Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2020

2019
BitBench: a benchmark for bitstream computing.
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019

SECO: A Scalable Accuracy Approximate Exponential Function Via Cross-Layer Optimization.
Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019

Recycling Data Slack in Out-of-Order Cores.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

2018
Aggressive Slack Recycling via Transparent Pipelines.
Proceedings of the International Symposium on Low Power Electronics and Design, 2018

Compiler assisted coalescing.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
The CURE: Cluster Communication Using Registers.
ACM Trans. Embed. Comput. Syst., 2017

Timing Speculation in Multi-Cycle Data Paths.
IEEE Comput. Archit. Lett., 2017

Temporal codes in on-chip interconnects.
Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017

CHARSTAR: Clock Hierarchy Aware Resource Scaling in Tiled ARchitectures.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Architectural Support for Server-Side PHP Processing.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Evaluating hopfield-network-based linear solvers for hardware constrained neural substrates.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

2016
BADGR: A practical GHR implementation for TAGE branch predictors.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Hash Map Inlining.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Adaptive Cache and Concurrency Allocation on GPGPUs.
IEEE Comput. Archit. Lett., 2015

COP: to compress and protect main memory.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

A self-learning map-seeking circuit for visual object recognition.
Proceedings of the 2015 International Joint Conference on Neural Networks, 2015

iPatch: Intelligent fault patching to improve energy efficiency.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Tag tables.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors.
Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015

2014
Bias-Free Branch Predictor.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Tag check elision.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014

Precision-aware soft error protection for GPUs.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Revolver: Processor architecture for power efficient loop execution.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Atomic SC for simple in-order processors.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

2013
Resilient High-Performance Processors with Spare RIBs.
IEEE Micro, 2013

Simulating cortical networks on heterogeneous multi-GPU systems.
J. Parallel Distributed Comput., 2013

Accelerating atomic operations on GPGPUs.
Proceedings of the 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS), 2013

Wavelength stealing: an opportunistic approach to channel sharing in multi-chip photonic interconnects.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

REEL: Reducing effective execution latency of floating point operations.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Data compression for thermal mitigation in the Hybrid Memory Cube.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Bridging the semantic gap: Emulating biological neuronal behaviors with simple digital neurons.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012
Edge chasing delayed consistency: pushing the limits of weak memory models.
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability, 2012

BenchNN: On the broad potential application scope of hardware neural network accelerators.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Mitigating random variation with spare RIBs: Redundant intermediate bitslices.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

2011
Systems for Very Large-Scale Computing.
IEEE Micro, 2011

CRAM: coded registers for amplified multiporting.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

The NoX router.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Accelerating search and recognition workloads with SSE 4.2 string and text processing instructions.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Automatic abstraction and fault tolerance in cortical microachitectures.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

CRIB: consolidated rename, issue, and bypass.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Atomic Coherence: Leveraging nanophotonics to build race-free cache coherence protocols.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Time redundant parity for low-cost transient error detection.
Proceedings of the Design, Automation and Test in Europe, 2011

A case for neuromorphic ISAs.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

2010
Combating Aging with the Colt Duty Cycle Equalizer.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

A Cortically Inspired Learning Model.
Proceedings of the Computational Intelligence, 2010

Discovering Cortical Algorithms.
Proceedings of the ICFC-ICNC 2010, 2010

Cortical architectures on a GPGPU.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

2009
Light speed arbitration and flow control for nanophotonic interconnects.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

SCARAB: a single cycle adaptive routing and bufferless network.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Achieving predictable performance through better memory controller placement in many-core CMPs.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Cortical columns: Building blocks for intelligent systems.
Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing, 2009

2008
Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Accelerating search and recognition with a TCAM functional unit.
Proceedings of the 26th International Conference on Computer Design, 2008

Power-Efficient DRAM Speculation.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

An accurate flip-flop selection technique for reducing logic SER.
Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2008

Skewed redundancy.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
Narrow Width Dynamic Scheduling.
J. Instr. Level Parallelism, 2007

Circuit-Switched Coherence.
IEEE Comput. Archit. Lett., 2007

Speculative optimization using hardware-monitored guarded regions for java virtual machines.
Proceedings of the 3rd International Conference on Virtual Execution Environments, 2007

Power-aware operand delivery.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

An Evaluation of Server Consolidation Workloads for Multi-Core Designs.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Transparent mode flip-flops for collapsible pipelines.
Proceedings of the 25th International Conference on Computer Design, 2007

A position-insensitive finished store buffer.
Proceedings of the 25th International Conference on Computer Design, 2007

2006
Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays.
IEEE Micro, 2006

Energy Estimation of the Memory Subsystem in Multiprocessor Systems.
J. Low Power Electron., 2006

Friendly fire: understanding the effects of multiprocessor prefetches.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Stall cycle redistribution in a transparent fetch pipeline.
Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

An approach for implementing efficient superscalar CISC processors.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

Dynamic Class Hierarchy Mutation.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

Stealth prefetching.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2005
The Complexity of Verifying Memory Coherence and Consistency.
IEEE Trans. Parallel Distributed Syst., 2005

Reaping the Benefit of Temporal Silence to Improve Communication Performance.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

2004
Memory Ordering: A Value-Based Approach.
IEEE Micro, 2004

Constraint Graph Analysis of Multithreaded Programs.
J. Instr. Level Parallelism, 2004

Deconstructing commit.
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Physical Register Inlining.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Understanding Scheduling Replay Schemes.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

2003
The complexity of verifying memory coherence.
Proceedings of the SPAA 2003: Proceedings of the Fifteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2003

Macro-op Scheduling: Relaxing Scheduling Loop Constraints.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Half-Price Architecture.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Exploiting Partial Operand Knowledge.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

Redeeming IPC as a Performance Metric for Multithreaded Programs.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

2002
Minimizing Energy Consumption for High-Performance Processing.
Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

Verifying sequential consistency using vector clocks.
Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, 2002

Avoiding Initialization Misses to the Heap.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Implementing Optimizations at Decode Time.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Temporally silent stores.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001
Silent Stores and Store Value Locality.
IEEE Trans. Computers, 2001

A dynamic binary translation approach to architectural simulation.
SIGARCH Comput. Archit. News, 2001

Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

An Architectural Evaluation of Java TPC-W.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

2000
A performance methodology for commercial servers.
IBM J. Res. Dev., 2000

Silent stores for free.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

On the value locality of store instructions.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Characterization of Silent Stores.
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999
The Effect of Program Optimization on Trace Cache Efficiency.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998
Exploiting Value Locality to Exceed the Dataflow Limit.
Int. J. Parallel Program., 1998

1997
Superspeculative Microarchitecture for Beyond AD 2000.
Computer, 1997

The Performance Potential of Value and Dependence Prediction.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

1996
Exceeding the Dataflow Limit via Value Prediction.
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

Can Trace-Driven Simulators Accurately Predict Superscalar Performance?
Proceedings of the 1996 International Conference on Computer Design (ICCD '96), 1996

Value Locality and Load Value Prediction.
Proceedings of the ASPLOS-VII Proceedings, 1996

1995
SPAID: software prefetching in pointer- and call-intensive environments.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

1993
Architecture-Compatible Code Boosting for Performance Enhancement of the IBM RS/6000.
Proceedings of the Proceedings 1993 International Conference on Computer Design: VLSI in Computers & Processors, 1993


  Loading...