Kunle Olukotun

According to our database1, Kunle Olukotun authored at least 153 papers between 1987 and 2018.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2006, "For contributions to multiprocessors on a chip and multi threaded processor design.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2018
Plasticine: A Reconfigurable Accelerator for Parallel Patterns.
IEEE Micro, 2018

Practical Design Space Exploration.
CoRR, 2018

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark.
CoRR, 2018

High-Accuracy Low-Precision Training.
CoRR, 2018

Exploring the Utility of Developer Exhaust.
Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018

Spatial: a language and compiler for application accelerators.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

LevelHeaded: A Unified Engine for Business Intelligence and Linear Algebra Querying.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

2017
EmptyHeaded: A Relational Engine for Graph Processing.
ACM Trans. Database Syst., 2017

Mind the Gap: Bridging Multi-Domain Query Workloads with EmptyHeaded.
PVLDB, 2017

LevelHeaded: Making Worst-Case Optimal Joins Work in the Common Case.
CoRR, 2017

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark.
CoRR, 2017

Infrastructure for Usable Machine Learning: The Stanford DAWN Project.
CoRR, 2017

Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Plasticine: A Reconfigurable Architecture For Parallel Paterns.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

2016
Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling.
CoRR, 2016

Old Techniques for New Join Algorithms: A Case Study in RDF Processing.
CoRR, 2016

EmptyHeaded: A Relational Engine for Graph Processing.
Proceedings of the 2016 International Conference on Management of Data, 2016

Automatic Generation of Efficient Accelerators for Reconfigurable Hardware.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Old techniques for new join algorithms: A case study in RDF processing.
Proceedings of the 32nd IEEE International Conference on Data Engineering Workshops, 2016

GraphOps: A Dataflow Library for Graph Analytics Acceleration.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

Generating Configurable Hardware from Parallel Patterns.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

Scaling Data Analytics with Moore's Law.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.
CoRR, 2015

Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms.
CoRR, 2015

Generating Configurable Hardware from Parallel Patterns.
CoRR, 2015

EmptyHeaded: Boolean Algebra Based Graph Processing.
CoRR, 2015

Energy-Efficient Abundant-Data Computing: The N3XT 1, 000x.
IEEE Computer, 2015

Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems.
Proceedings of the 1st Summit on Advances in Programming Languages, 2015

Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Automatic support for multi-module parallelism from computational patterns.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

EMEURO: a framework for generating multi-purpose accelerators via deep learning.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

2014
Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages.
ACM Trans. Embedded Comput. Syst., 2014

Guest Editorial.
International Journal of Parallel Programming, 2014

Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems.
CoRR, 2014

Beyond parallel programming with domain specific languages.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Surgical precision JIT compilers.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Locality-Aware Mapping of Nested Parallel Patterns on GPUs.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Author's retrospective for: improving the performance of speculatively parallel applications on the hydra CMP.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Hardware system synthesis from Domain-Specific Languages.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Hardware acceleration of database operations.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Simplifying Scalable Graph Processing with a Domain-Specific Language.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013
On fast parallel detection of strongly connected components (SCC) in small-world graphs.
Proceedings of the International Conference for High Performance Computing, 2013

Optimizing data structures in high-level programs: new directions for extensible compilers based on staging.
Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2013

Forge: generating a high performance DSL implementation from a declarative specification.
Proceedings of the Generative Programming: Concepts and Experiences, 2013

Composition and Reuse with Compiled Domain-Specific Languages.
Proceedings of the ECOOP 2013 - Object-Oriented Programming, 2013

2012
Utilizing Static Analysis and Code Generation to Accelerate Neural Networks.
Proceedings of the 29th International Conference on Machine Learning, 2012

High performance embedded domain specific languages.
Proceedings of the ACM SIGPLAN International Conference on Functional Programming, 2012

A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware.
Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, 2012

Green-Marl: a DSL for easy and efficient graph analysis.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011
Implementing Domain-Specific Languages for Heterogeneous Parallel Computing.
IEEE Micro, 2011

Building-Blocks for Performance Oriented DSLs
Proceedings of the Proceedings IFIP Working Conference on Domain-Specific Languages, 2011

Accelerating CUDA graph algorithms at maximum warp.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

A domain-specific approach to heterogeneous parallelism.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

Panel Statement.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning.
Proceedings of the 28th International Conference on Machine Learning, 2011

Runtime automatic speculative parallelization.
Proceedings of the CGO 2011, 2011

Hardware acceleration of transactional memory on commodity systems.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

A Heterogeneous Parallel Framework for Domain-Specific Languages.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford.
IEEE Micro, 2010

Implementing and evaluating nested parallel transactions in software transactional memory.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

Extreme scale computing: challenges and opportunities.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

A practical concurrent binary search tree.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Transactional predication: high-performance concurrent sets and maps for STM.
Proceedings of the 29th Annual ACM Symposium on Principles of Distributed Computing, 2010

Language virtualization for heterogeneous parallel computing.
Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2010

Chip multiprocessor architecture: A programmability-driven approach.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Eigenbench: A simple exploration tool for orthogonal TM characteristics.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

Making nested parallel transactions practical using lightweight hardware support.
Proceedings of the 24th International Conference on Supercomputing, 2010

Implementing and Evaluating a Model Checker for Transactional Memory Systems.
Proceedings of the 15th IEEE International Conference on Engineering of Complex Computer Systems, 2010

Extreme scale computing: Challenges and opportunities.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

A Large-Scale Architecture for Restricted Boltzmann Machines.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

Hardware/software co-design for high performance computing: challenges and opportunities.
Proceedings of the 8th International Conference on Hardware/Software Codesign and System Synthesis, 2010

2009
Feedback-directed barrier optimization in a strongly isolated STM.
Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2009

A highly scalable Restricted Boltzmann Machine FPGA implementation.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

2008
Improving software concurrency with hardware-assisted memory snapshot.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Ased: availability, security, and debugging support usingtransactional memory.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

STAMP: Stanford Transactional Applications for Multi-Processing.
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008

2007
iChip Multiprocessor Architecture: Techniques to Improve Throughput and Latency
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, 2007

Transactional Memory: The Hardware-Software Interface.
IEEE Micro, 2007

Towards soft optimization techniques for parallel cognitive applications.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Transactional collection classes.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

An effective hybrid transactional memory system with strong isolation guarantees.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

A Scalable, Non-blocking Approach to Transactional Memory.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

A practical FPGA-based framework for novel CMP research.
Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, 2007

ATLAS: a chip-multiprocessor with transactional memory support.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

The OpenTM Transactional Application Programming Interface.
Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007

2006
Executing Java programs with transactional memory.
Sci. Comput. Program., 2006

The Identity Management Kalman Filter (IMKF).
Proceedings of the Robotics: Science and Systems II, 2006

The Atomos transactional programming language.
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006

Map-Reduce for Machine Learning on Multicore.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Architectural Semantics for Practical Transactional Memory.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

The common case transactional behavior of multithreaded programs.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

Tradeoffs in transactional memory virtualization.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

Testing implementations of transactional memory.
Proceedings of the 15th International Conference on Parallel Architecture and Compilation Techniques (PACT 2006), 2006

2005
A chip prototyping substrate: the flexible architecture for simulation and testing (FAST).
SIGARCH Computer Architecture News, 2005

The future of microprocessors.
ACM Queue, 2005

Niagara: A 32-Way Multithreaded Sparc Processor.
IEEE Micro, 2005

Exposing speculative thread parallelism in SPEC2000.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

The Information-Form Data Association Filter.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

An Application Analysis Framework For Polymorphic Chip Multiprocessors.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

TAPE: a transactional application profiling environment.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

A New Approach to Programming and Prototyping Parallel Systems.
Proceedings of the High Performance Computing, 2005

Characterization of TCC on Chip-Multiprocessors.
Proceedings of the 14th International Conference on Parallel Architecture and Compilation Techniques (PACT 2005), 2005

Maximizing CMP Throughput with Mediocre Cores.
Proceedings of the 14th International Conference on Parallel Architecture and Compilation Techniques (PACT 2005), 2005

2004
Transactional Coherence and Consistency: Simplifying Parallel Hardware and Software.
IEEE Micro, 2004

Transactional Memory Coherence and Consistency.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Programming with transactional coherence and consistency (TCC).
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

2003
The Jrpm System for Dynamically Parallelizing Sequential Java Programs.
IEEE Micro, 2003

Using thread-level speculation to simplify manual parallelization.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

The Jrpm System for Dynamically Parallelizing Java Programs.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

TEST: A Tracer for Extracting Speculative Thread.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002
Targeting Dynamic Compilation for Embedded Environments.
Proceedings of the 2nd Java Virtual Machine Research and Technology Symposium, 2002

Efficient state representation for symbolic simulation.
Proceedings of the 39th Design Automation Conference, 2002

2001
High Bandwidth On-Chip Cache Design.
IEEE Trans. Computers, 2001

2000
The Stanford Hydra CMP.
IEEE Micro, 2000

1999
Improving the performance of speculatively parallel applications on the Hydra CMP.
Proceedings of the 13th international conference on Supercomputing, 1999

JMTP: an architecture for exploiting concurrency in embedded Java applications with real-time considerations.
Proceedings of the 1999 IEEE/ACM International Conference on Computer-Aided Design, 1999

1998
DCP: an algorithm for datapath/control partitioning of synthesizable RTL models.
Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

REMARC: Reconfigurable Multimedia Array Coprocessor (Abstract).
Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, 1998

A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications.
Proceedings of the 6th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '98), 1998

Digital System Simulation: Methodologies and Examples.
Proceedings of the 35th Conference on Design Automation, 1998

Data Speculation Support for a Chip Multiprocessor.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

Exploiting Method-Level Parallelism in Single-Threaded Java Programs.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997
Multilevel Optimization of Pipelined Caches.
IEEE Trans. Computers, 1997

A Single-Chip Multiprocessor.
IEEE Computer, 1997

Designing High Bandwidth On-Chip Caches.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

Verifying correct pipeline implementation for microprocessors.
Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design, 1997

Java as a specification language for hardware-software systems.
Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design, 1997

The Hierarchical Multi-Bank DRAM: A High-Performance Architecture for Memory Integrated with Processors.
Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97), 1997

1996
Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Evaluation of Design Alternatives for a Multiprocessor Microprocessor.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

The Impact of Shared-Cache Clustering in Small-Scale Shared-Memory Multiprocessors.
Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996

A Scalable Formal Verification Methodology for Pipelined Microprocessors.
Proceedings of the 33st Conference on Design Automation, 1996

The Case for a Single-Chip Multiprocessor.
Proceedings of the ASPLOS-VII Proceedings, 1996

1995
The Benefits of Clustering in Shared Address Space Multiprocessors: An Applications-Driven Investigation.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

A General Method for Compiling Event-Driven Simulations.
Proceedings of the 32st Conference on Design Automation, 1995

1994
A software-hardware cosynthesis approach to digital system simulation.
IEEE Micro, 1994

Exploring the Design Space for a Shared-Cache Multiprocessor.
Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994

1992
Analysis and design of latch-controlled synchronous digital circuits.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1992

Performance Optimization of Pipelined Primary Caches.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

1991
The Design of a Microsupercomputer.
IEEE Computer, 1991

Implementing a Cache for a High-Performance GaAs Microprocessor.
Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

1990
Hierarchical Gate-Array Routing on a Hypercube Multiprocessor.
J. Parallel Distrib. Comput., 1990

check Tc and min Tc: Timing Verification and Optimal Clocking of Synchronous Digtal Circuits.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 1990

Analysis and Design of Latch-Controlled Synchronous Digital Circuits.
Proceedings of the 27th ACM/IEEE Design Automation Conference. Orlando, 1990

1987
A Preliminary Investigation into Parallel Routing on a Hypercube Computer.
Proceedings of the 24th ACM/IEEE Design Automation Conference. Miami Beach, FL, USA, June 28, 1987


  Loading...