Michael F. P. O'Boyle

Orcid: 0000-0003-1619-5052

Affiliations:
  • University of Edinburgh, Scotland, UK


According to our database1, Michael F. P. O'Boyle authored at least 169 papers between 1992 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023
DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack.
CoRR, 2023

Rewriting History: Repurposing Domain-Specific CGRAs.
CoRR, 2023

SLaDe: A Portable Small Language Model Decompiler for Optimized Assembler.
CoRR, 2023

When Does Saving Power Save the Planet?
Proceedings of the 2nd Workshop on Sustainable Computer Systems, 2023

C2TACO: Lifting Tensor Code to TACO.
Proceedings of the 22nd ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, 2023

HyBF: A Hybrid Branch Fusion Strategy for Code Size Reduction.
Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, 2023

Matching Linear Algebra and Tensor Code to Specialized Hardware Accelerators.
Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, 2023

mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR Using Program Synthesis.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
Bind the gap: compiling real software to hardware FFT accelerators.
Proceedings of the PLDI '22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13, 2022

ExeBench: an ML-scale dataset of executable C functions.
Proceedings of the MAPS@PLDI 2022: 6th ACM SIGPLAN International Symposium on Machine Programming, 2022

Investigating magic numbers: improving the inlining heuristic in the Glasgow Haskell Compiler.
Proceedings of the Haskell '22: 15th ACM SIGPLAN International Haskell Symposium, Ljubljana, Slovenia, September 15, 2022

F3M: Fast Focused Function Merging.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

Loop Rolling for Code Size Reduction.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

2021
Learning C to x86 Translation: An Experiment in Neural Compilation.
CoRR, 2021

SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable Accelerator.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations.
Proceedings of the 38th International Conference on Machine Learning, 2021

Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

New Regular Expressions on Old Accelerators.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

CoSPARSE: A Software and Hardware Reconfigurable SpMV Framework for Graph Analytics.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Neural architecture search as program transformation exploration.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Program Lifting using Gray-Box Behavior.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020
Deep Data Flow Analysis.
CoRR, 2020

Retrofitting Symbolic Holes to LLVM IR.
CoRR, 2020

TASO: Time and Space Optimization for Memory-Constrained DNN Inference.
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

Automatic generation of specialized direct convolutions for mobile GPUs.
Proceedings of the GPGPU@PPoPP '20: 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit colocated with 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

M3: Semantic API Migrations.
Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020

HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget.
Proceedings of the 8th International Conference on Learning Representations, 2020

Modeling black-box components with probabilistic synthesis.
Proceedings of the GPCE '20: Proceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, 2020

DelayRepay: delayed execution for kernel fusion in Python.
Proceedings of the DLS 2020: Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages, 2020

Automatically harnessing sparse acceleration.
Proceedings of the CC '20: 29th International Conference on Compiler Construction, 2020

Optimizing Grouped Convolutions on Edge Devices.
Proceedings of the 31st IEEE International Conference on Application-specific Systems, 2020

Transmuter: Bridging the Efficiency Gap using Memory and Dataflow Reconfiguration.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Augmenting Type Signatures for Program Synthesis.
CoRR, 2019

BlockSwap: Fisher-guided Block Substitution for Network Compression.
CoRR, 2019

Full-System Simulation of Mobile CPU/GPU Platforms.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

SLAMBench 3.0: Systematic Automated Reproducible Evaluation of SLAM Systems for Robot Vision Challenges and Scene Understanding.
Proceedings of the International Conference on Robotics and Automation, 2019

POSTER: Space and Time Optimal DNN Primitive Selection with Integer Linear Programming.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

Specialization Opportunities in Graphical Workloads.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

Type-Directed Program Synthesis and Constraint Generation for Library Portability.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Machine Learning in Compiler Optimization.
Proc. IEEE, 2018

Navigating the Landscape for Real-Time Localization and Mapping for Robotics and Virtual and Augmented Reality.
Proc. IEEE, 2018

HAKD: Hardware Aware Knowledge Distillation.
CoRR, 2018

Pruning neural networks: is it time to nip it in the bud?
CoRR, 2018

Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality.
CoRR, 2018

Machine Learning in Compiler Optimisation.
CoRR, 2018

MaxPair: Enhance OpenCL Concurrent Kernel Execution by Weighted Maximum Matching.
Proceedings of the 11th Workshop on General Purpose Processing using GPUs, 2018

A Cross-platform Evaluation of Graphics Shader Compiler Optimization.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

Automatic Parameter Tuning of Motion Planning Algorithms.
Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018

Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

SLAMBench2: Multi-Objective Head-to-Head Benchmarking for Visual SLAM.
Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

CAnDL: a domain specific language for compiler analysis.
Proceedings of the 27th International Conference on Compiler Construction, 2018

Automatic Matching of Legacy Code to Heterogeneous APIs: An Idiomatic Approach.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
Merge or Separate?: Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms.
Proceedings of the General Purpose GPUs, 2017

Discovery and exploitation of general reductions: a constraint based approach.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

2016
Selecting Heterogeneous Cores for Diversity.
ACM Trans. Archit. Code Optim., 2016

Four Metrics to Evaluate Heterogeneous Multicores.
ACM Trans. Archit. Code Optim., 2016

Diplomat: Mapping of Multi-kernel Applications Using a Static Dataflow Abstraction.
Proceedings of the 24th IEEE International Symposium on Modeling, 2016

Portable and transparent software managed scheduling on accelerators for fair resource sharing.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

PALMOS: A Transparent, Multi-tasking Acceleration Layer for Parallel Heterogeneous Systems.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM.
Proceedings of the IEEE International Conference on Robotics and Automation, 2015

2014
Integrating profile-driven parallelism detection and machine-learning-based mapping.
ACM Trans. Archit. Code Optim., 2014

Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems.
ACM Trans. Archit. Code Optim., 2014

Automatic feature generation for machine learning-based optimising compilation.
ACM Trans. Archit. Code Optim., 2014

Partitioning data-parallel programs for heterogeneous MPSoCs: time and energy design space exploration.
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2014

Change Detection Based Parallelism Mapping: Exploiting Offline Models and Online Adaptation.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms.
Proceedings of the 21st International Conference on High Performance Computing, 2014

Portable and Transparent Host-Device Communication Optimization for GPGPU Environments.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code.
Proceedings of the Compiler Construction - 23rd International Conference, 2014

A compiler framework for automatically mapping data parallel programs to heterogeneous MPSoCs.
Proceedings of the 2014 International Conference on Compilers, 2014

Exploiting GPU Hardware Saturation for Fast Compiler Optimization.
Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

Measuring flexibility in single-ISA heterogeneous processors.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

Automatic optimization of thread-coarsening for graphics processors.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Using machine learning to partition streaming programs.
ACM Trans. Archit. Code Optim., 2013

A large-scale cross-architecture evaluation of thread-coarsening.
Proceedings of the International Conference for High Performance Computing, 2013

OpenCL Task Partitioning in the Presence of GPU Contention.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

Portable mapping of data parallel programs to OpenCL for heterogeneous systems.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Smart, adaptive mapping of parallelism in the presence of external workload.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

General chairs' welcome message.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Exploring and Predicting the Effects of Microarchitectural Parameters and Compiler Optimizations on Performance and Energy.
ACM Trans. Embed. Comput. Syst., 2012

2011
Compiler Directed Issue Queue Energy Reduction.
Trans. High Perform. Embed. Archit. Compil., 2011

An Empirical Architecture-Centric Approach to Microarchitectural Design Space Exploration.
IEEE Trans. Computers, 2011

Milepost GCC: Machine Learning Enabled Self-tuning Compiler.
Int. J. Parallel Program., 2011

A workload-aware mapping approach for data-parallel programs.
Proceedings of the High Performance Embedded Architectures and Compilers, 2011

A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL.
Proceedings of the Compiler Construction - 20th International Conference, 2011

2010
A Predictive Model for Dynamic Microarchitectural Adaptivity Control.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Partitioning streaming parallelism for multi-cores: a machine learning based approach.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Energy-efficient register caching with compiler assistance.
ACM Trans. Archit. Code Optim., 2009

Exploring the limits of early register release: Exploiting compiler analysis.
ACM Trans. Archit. Code Optim., 2009

Obituary: Peter Knijnenburg (1961-2007).
Concurr. Comput. Pract. Exp., 2009

Mapping parallelism to multi-cores: a machine learning based approach.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping.
Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009

Portable compiler optimisation across embedded programs and microarchitectures using machine learning.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Raced profiles: efficient selection of competing compiler optimizations.
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, 2009

Reducing Training Time in a One-Shot Machine Learning-Based Compiler.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Rapid early-stage microarchitecture design using predictive models.
Proceedings of the 27th International Conference on Computer Design, 2009

Automatic Feature Generation for Machine Learning Based Optimizing Compilation.
Proceedings of the CGO 2009, 2009

2008
Instruction Cache Energy Saving Through Compiler Way-Placement.
Proceedings of the Design, Automation and Test in Europe, 2008

Exploring and predicting the architecture/optimising compiler co-design space.
Proceedings of the 2008 International Conference on Compilers, 2008

2007
Introduction to Part 2.
Trans. High Perform. Embed. Archit. Compil., 2007

Quick and Practical Run-Time Evaluation of Multiple Program Optimizations.
Trans. High Perform. Embed. Archit. Compil., 2007

High-Performance Embedded Architecture and Compilation Roadmap.
Trans. High Perform. Embed. Archit. Compil., 2007

Microarchitectural Design Space Exploration Using an Architecture-Centric Approach.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007

Topic 4 High-Performance Architectures and Compilers.
Proceedings of the Euro-Par 2007, 2007

Rapidly Selecting Good Compiler Optimizations using Performance Counters.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

Fast compiler optimisation evaluation using code-feature based performance prediction.
Proceedings of the 4th Conference on Computing Frontiers, 2007

2006
Method-specific dynamic compilation using logistic regression.
Proceedings of the 21th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2006

Predictive search distributions.
Proceedings of the Machine Learning, 2006

Using Machine Learning to Focus Iterative Optimization.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

Hybrid Optimizations: Which Optimization Algorithm to Use?.
Proceedings of the Compiler Construction, 15th International Conference, 2006

Iterative Collective Loop Fusion.
Proceedings of the Compiler Construction, 15th International Conference, 2006

Automatic performance model construction for the fast software exploration of new hardware designs.
Proceedings of the 2006 International Conference on Compilers, 2006

2005
A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems.
IEEE Trans. Parallel Distributed Syst., 2005

IATAC: a smart predictor to turn-off L2 cache lines.
ACM Trans. Archit. Code Optim., 2005

Automatic Tuning of Inlining Heuristics.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Probabilistic source-level optimisation of embedded programs.
Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, 2005

Software Directed Issue Queue Power Reduction.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

A Practical Method for Quickly Evaluating Program Optimizations.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Topic 4 - Compilers for High Performance.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Compiler Directed Early Register Release.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2004
The effect of cache models on iterative compilation for combined tiling and unrolling.
Concurr. Comput. Pract. Exp., 2004

A fast and accurate method for determining a lower bound on execution time.
Concurr. Comput. Pract. Exp., 2004

Adaptive Java optimisation using instance-based learning.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Topic 4: Compilers for High Performance.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Cross Component Optimisation in a High Level Category-Based Language.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

2003
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation.
J. Supercomput., 2003

Array recovery and high-level transformations for DSP applications.
ACM Trans. Embed. Comput. Syst., 2003

Towards general and exact distributed invalidation.
J. Parallel Distributed Comput., 2003

Topic Introduction.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

Compiler parallelization of C programs for multi-core DSPs with multiple address spaces.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

Combining Program Recovery, Auto-Parallelisation and Locality Analysis for C Programs on Multi-Processor Embedded Systems.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

2002
Compile Time Barrier Synchronization Minimization.
IEEE Trans. Parallel Distributed Syst., 2002

Integrating Loop and Data Transformations for Global Optimization.
J. Parallel Distributed Comput., 2002

Iterative Compilation.
Proceedings of the Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation, 2002

Evaluating Iterative Compilation.
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

2001
Topic 04: Compilers for High Performance.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

Compiler Transformation of Pointers to Explicit Array Accesses in DSP Applications.
Proceedings of the Compiler Construction, 10th International Conference, 2001

An empirical evaluation of high level transformations for embedded processors.
Proceedings of the 2001 International Conference on Compilers, 2001

2000
Exact Distributed Invalidation.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999
Nonsingular Data Transformations: Definition, Validity, and Applications.
Int. J. Parallel Program., 1999

A Feasibility Study in Iterative Compilation.
Proceedings of the High Performance Computing, Second International Symposium, 1999

OCEANS - Optimising Compilers for Embedded Applications.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

Efficient Parallelization Using Combined Loop and Data Transformations.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998
First Fast Sink: A compiler algorithm for barrier placement optimisation.
Future Gener. Comput. Syst., 1998

MARS: A Distributed Memory Approach to Shared Memory Compilation.
Proceedings of the Languages, 1998


1997
A Graph Based Approach to Barrier Synchronisation Minimisation.
Proceedings of the 11th international conference on Supercomputing, 1997

Non-Singular Data Transformations: Definition, Validity and Applications.
Proceedings of the 11th international conference on Supercomputing, 1997

Barrier Synchronisation Optimisation.
Proceedings of the High-Performance Computing and Networking, 1997


1996
Expert Programmer versus Parallelizing Compiler: A Comparative Study of Two Approaches for Distributed Shared Memory.
Sci. Program., 1996

Practical Loop Generation.
Proceedings of the 29th Annual Hawaii International Conference on System Sciences (HICSS-29), 1996

Compiler Reduction of Invalidation Traffic in Virtual Shared Memory Systems.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

A compiler algorithm to reduce invalidation latency in virtual shared memory systems.
Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995
Synchronization Minimization in a SPMD Execution Model.
J. Parallel Distributed Comput., 1995

A hierarchical locality algorithm for NUMA compilation.
Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing (PDP '95), 1995

A Compiler Strategy for Shared Virtual Memories.
Proceedings of the Languages, 1995

Compiler Reduction of Synchronisation in Shared Virtual Memory Systems.
Proceedings of the 9th international conference on Supercomputing, 1995

1994
A Data Partitioning Algorithm for Distributed Memory Compilation.
Proceedings of the PARLE '94: Parallel Architectures and Languages Europe, 1994

1993
Program and data transformations for efficient execution on distributed memory architectures.
PhD thesis, 1993

1992
A New Program Transformation to Minimise Communication in Distributed Memory Architecture.
Proceedings of the PARLE '92: Parallel Architectures and Languages Europe, 1992

A transformational approach to compiling Sisal for distributed memory architectures.
Proceedings of the 6th international conference on Supercomputing, 1992


  Loading...