Martin Burtscher

Ganesh Gopalakrishnan

CoRR, 2024

2023

Choosing the Best Parallelization and Implementation Styles for Graph Analytics Codes: Lessons Learned from 1106 Programs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

A High-Performance MST Implementation for GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

A GPU Algorithm for Detecting Strongly Connected Components.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

2022

Improving the Speed and Quality of Parallel Graph Coloring.

[BibT_eX]

[DOI]

Ghadeer Alabandi

ACM Trans. Parallel Comput., 2022

Parla: A Python Orchestration System for Heterogeneous Architectures.

[BibT_eX]

[DOI]

Christopher J. Rossbach

Mattan Erez

George Biros

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Compressed In-memory Graphs for Accelerating GPU-based Analytics.

[BibT_eX]

[DOI]

Noushin Azami

Proceedings of the 12th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2022

Reducing Memory-Bus Energy Consumption of GPUs via Software-Based Bit-Flip Minimization.

[BibT_eX]

[DOI]

Alex Fallin

Proceedings of the IEEE/ACM Workshop on Memory Centric High Performance Computing, 2022

The Indigo Program-Verification Microbenchmark Suite of Irregular Parallel Code Patterns.

[BibT_eX]

[DOI]

Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

A Simple, Fast, and GPU-friendly Steiner-Tree Heuristic.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

2021

The Use of Pulse Oximetry in the Assessment of Acclimatization to High Altitude.

[BibT_eX]

[DOI]

Sensors, 2021

Discovering and balancing fundamental cycles in large signed graphs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

BiPart: a parallel and deterministic hypergraph partitioner.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

2020

BiPart: A Parallel and Deterministic Multilevel Hypergraph Partitioner.

[BibT_eX]

[DOI]

CoRR, 2020

Increasing the parallelism of graph coloring via shortcutting.

[BibT_eX]

[DOI]

Ghadeer Alabandi

Evan Powers

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

2019

A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels.

[BibT_eX]

[DOI]

Saeed Taheri

Apan Qasem

CoRR, 2019

SPRoute: A Scalable Parallel Negotiation-based Global Router.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2019

DiffTrace: Efficient Whole-Program Trace Analysis and Diffing for Debugging.

[BibT_eX]

[DOI]

Saeed Taheri

Ian Briggs

Ganesh Gopalakrishnan

Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018

A High-Quality and Fast Maximal Independent Set Implementation for GPUs.

[BibT_eX]

[DOI]

Sindhu Devale

Sahar Azimi

Jayadharini Jaiganesh

Evan Powers

ACM Trans. Parallel Comput., 2018

ParLoT: Efficient Whole-Program Call Tracing for HPC Applications.

[BibT_eX]

[DOI]

Saeed Taheri

Sindhu Devale

Ganesh Gopalakrishnan

Arturo González-Escribano

Proceedings of the Programming and Performance Visualization Tools, 2018

Peachy Parallel Assignments (EduHPC 2018).

[BibT_eX]

[DOI]

Eduardo Rodriguez-Gutiez

David P. Bunde

Proceedings of the 2018 IEEE/ACM Workshop on Education for High-Performance Computing, 2018

A high-performance connected components implementation for GPUs.

[BibT_eX]

[DOI]

Jayadharini Jaiganesh

Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

SPDP: An Automatically Synthesized Lossless Compression Algorithm for Floating-Point Data.

[BibT_eX]

[DOI]

Steven Claggett

Sahar Azimi

Proceedings of the 2018 Data Compression Conference, 2018

Automatic Hierarchical Parallelization of Linear Recurrences.

[BibT_eX]

[DOI]

Sepideh Maleki

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2016

Geometric Representations of the n-anacci Constants and Generalizations Thereof.

[BibT_eX]

[DOI]

J. Integer Seq., 2016

Real-time synthesis of compression algorithms for scientific data.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

Higher-order and tuple-based massively-parallel prefix sums.

[BibT_eX]

[DOI]

Sepideh Maleki

Annie Yang

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

Parallel Graph Partitioning on a CPU-GPU Architecture.

[BibT_eX]

[DOI]

Bahareh Goodarzi

Dhrubajyoti Goswami

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Energy, Power, and Performance Characterization of GPGPU Benchmark Programs.

[BibT_eX]

[DOI]

Jared Coplin

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015

Analytic Representations of the n-anacci Constants and Generalizations Thereof.

[BibT_eX]

[DOI]

J. Integer Seq., 2015

A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2015

A Module-based Approach to Adopting the 2013 ACM Curricular Recommendations on Parallel Computing.

[BibT_eX]

[DOI]

Proceedings of the 46th ACM Technical Symposium on Computer Science Education, 2015

Rethinking the parallelization of random-restart hill climbing: a case study in optimizing a 2-opt TSP solver for GPU execution.

[BibT_eX]

[DOI]

Molly A. O'Neil

Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Effects of source-code optimizations on GPU performance and energy consumption.

[BibT_eX]

[DOI]

Jared Coplin

Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Quantifying Benefits of Lossless Compression Utilities on Modern Smartphones.

[BibT_eX]

[DOI]

Armen Dzhagaryan

Proceedings of the 24th International Conference on Computer Communication and Networks, 2015

Maximizing Hardware Prefetch Effectiveness with Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

MPC: A Massively Parallel Compression Algorithm for Scientific Data.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

Using Branch Predictors and Variable Encoding for On-the-Fly Program Tracing.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2014

The future of accelerator programming: abstraction, performance or can we have both?

[BibT_eX]

[DOI]

Kamil Rocki

Reiji Suda

Proceedings of the Symposium on Applied Computing, 2014

Performance and Energy Modeling for Cooperative Hybrid Computing.

[BibT_eX]

[DOI]

Proceedings of the 9th IEEE International Conference on Networking, 2014

Microarchitectural performance characterization of irregular GPU kernels.

[BibT_eX]

[DOI]

Molly A. O'Neil

Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

PEACH: a model for performance and energy aware cooperative hybrid computing.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, CF'14, 2014

Measuring GPU Power with the K20 Built-in Sensor.

[BibT_eX]

[DOI]

Ivan Zecena

Ziliang Zong

Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

Extended Large Scale Sketch-Based 3D Shape Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 7th Eurographics Workshop on 3D Object Retrieval, 2014

2013

Morph algorithms on GPUs.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Performance and Energy Consumption of Lossless Compression/Decompression Utilities on Mobile Computing Platforms.

[BibT_eX]

[DOI]

Armen Dzhagaryan

Proceedings of the 2013 IEEE 21st International Symposium on Modelling, 2013

Energy efficiency of lossless data compression on a mobile device: An experimental evaluation.

[BibT_eX]

[DOI]

Armen Dzhagaryan

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Data-Driven Versus Topology-driven Irregular Computations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches.

[BibT_eX]

[DOI]

Hassan Rabeti

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Evaluating the performance and energy efficiency of n-body codes on multi-core CPUs and GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 32nd International Performance Computing and Communications Conference, 2013

Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

Atomic-free irregular computations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

2012

Efficient Runtime Detection and Toleration of Asymmetric Races.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2012

A GPU implementation of inclusion-based points-to analysis.

[BibT_eX]

[DOI]

Mario Méndez-Lojo

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

A quantitative study of irregular programs on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Hardware support for enforcing isolation in lock-based parallel programs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2012

2011

Caches and Predictors for Real-Time, Unobtrusive, and Cost-Effective Program Tracing in Embedded Systems.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2011

Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms.

[BibT_eX]

[DOI]

Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

The tao of parallelism in algorithms.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011

Evaluation and optimization of multicore performance bottlenecks in supercomputing applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Floating-point data compression at 75 Gb/s on a GPU.

[BibT_eX]

[DOI]

Molly A. O'Neil

Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, 2011

2010

JSZap: Compressing JavaScript Code.

[BibT_eX]

[DOI]

Gaurav Sinha

Proceedings of the USENIX Conference on Web Application Development, 2010

PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Structure-driven optimizations for amorphous data-parallel programs.

[BibT_eX]

[DOI]

Milind Kulkarni

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Parallel Graph Partitioning on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2010

gFPC: A Self-Tuning Compression Algorithm.

[BibT_eX]

[DOI]

Proceedings of the 2010 Data Compression Conference (DCC 2010), 2010

Real-time unobtrusive program execution trace compression using branch predictor events.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Compilers, 2010

Ordered and unordered algorithms for parallel breadth first search.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

FPC: A High-Speed Compressor for Double-Precision Floating-Point Data.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2009

Real-Time Message Compression in Software.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

Detecting and tolerating asymmetric races.

[BibT_eX]

[DOI]

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

How much parallelism is there in irregular applications?

[BibT_eX]

[DOI]

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Lonestar: A suite of parallel irregular programs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Real-time, unobtrusive, and efficient program execution tracing with stream caches and last stream predictors.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Computer Design, 2009

pFPC: A Parallel Compressor for Floating-Point Data.

[BibT_eX]

[DOI]

Proceedings of the 2009 Data Compression Conference (DCC 2009), 2009

2008

On the Scalability of an Automatically Parallelized Irregular Application.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2008

Program Phase Detection based on Critical Basic Block Transitions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

On the Role of a Nonlinear Stress-Strain Relation in Brain Trauma.

[BibT_eX]

Proceedings of the International Conference on Bioinformatics & Computational Biology, 2008

2007

Computational Modeling of Brain Dynamics during Repetitive Head Motions.

[BibT_eX]

Proceedings of the 2007 International Conference on Modeling, 2007

Algorithms and Hardware Structures for Unobtrusive Real-Time Compression of Instruction and Data Address Traces.

[BibT_eX]

[DOI]

Proceedings of the 2007 Data Compression Conference (DCC 2007), 2007

High Throughput Compression of Double-Precision Floating-Point Data.

[BibT_eX]

[DOI]

Proceedings of the 2007 Data Compression Conference (DCC 2007), 2007

2006

Future execution: A prefetching mechanism that uses multiple cores to speed up single threads.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2006

TCgen 2.0: a tool to automatically generate lossless trace compressors.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2006

Computational Simulation and Visualization of Traumatic Brain Injuries.

[BibT_eX]

Proceedings of the 2006 International Conference on Modeling, 2006

Load Instruction Characterization and Acceleration of the BioPerf Programs.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Fast Lossless Compression of Scientific Floating-Point Data.

[BibT_eX]

[DOI]

Proceedings of the 2006 Data Compression Conference (DCC 2006), 2006

Efficient emulation of hardware prefetchers via event-driven helper threading.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005

The VPC Trace-Compression Algorithms.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2005

Improving memory system performance with energy-efficient value speculation.

[BibT_eX]

[DOI]

Nana B. Sam

SIGARCH Comput. Archit. News, 2005

Bridging the Processor-Memory Performance Gapwith 3D IC Technology.

[BibT_eX]

[DOI]

IEEE Des. Test Comput., 2005

Reducing Communication Time through Message Prefetching.

[BibT_eX]

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2005

Numerical Modeling of Brain Dynamics in Traumatic Situations - Impulsive Translations.

[BibT_eX]

Proceedings of The 2005 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, 2005

Tolerating Message Latency Through the Early Release of Blocked Receives.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Automatic Generation of High-Performance Trace Compressors.

[BibT_eX]

[DOI]

Nana B. Sam

Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

On the energy-efficiency of speculative hardware.

[BibT_eX]

[DOI]

Nana B. Sam

Proceedings of the Second Conference on Computing Frontiers, 2005

Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

On the importance of optimizing the configuration of stream prefetchers.

[BibT_eX]

[DOI]

Proceedings of the 2005 workshop on Memory System Performance, 2005

2004

VPC3: a fast and effective trace-compression algorithm.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2004

Runtime Compression of MPI Messanes to Improve the Performance and Scalability of Parallel Applications.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Automatic Synthesis of High-Speed Processor Simulators.

[BibT_eX]

[DOI]

Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

2003

Compressing Extended Program Traces Using Value Predictors.

[BibT_eX]

[DOI]

Metha Jeeradit

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

2002

Hybrid Load-Value Predictors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2002

An improved index function for (D)FCM predictors.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2002

Static Load Classification for Improving the Value Predictability of Data-Cache Misses.

[BibT_eX]

[DOI]

Amer Diwan

Matthias Hauswirth

Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002

Delphi: Predition-based Page Prefetching to Improve the Performance of Shared Virtual Memory Systems.

[BibT_eX]

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

2000

Hybridizing and Coalescing Load Value Predictors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference On Computer Design: VLSI In Computers & Processors, 2000

1999

Prediction Outcome History-Based Confidence Estimation for Load Value Prediction.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 1999

Exploring Last n Value Prediction.

[BibT_eX]

[DOI]