Leonid Oliker

According to our database1, Leonid Oliker authored at least 136 papers between 1996 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Evaluating the Potential of Disaggregated Memory Systems for HPC applications.
CoRR, 2023

Designing Efficient SIMD Kernels for High Performance Sequence Alignment.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022
Extreme-Scale Many-against-Many Protein Similarity Search.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Preprocessing Pipeline Optimization for Scientific Deep Learning Workloads.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly.
Proceedings of the 51st International Conference on Parallel Processing, 2022

2021
Accelerating large scale <i>de novo</i> metagenome assembly using GPUs.
Proceedings of the International Conference for High Performance Computing, 2021

Architectural Requirements for Deep Learning Workloads in HPC Environments.
Proceedings of the 2021 International Workshop on Performance Modeling, 2021

Distributed-Memory k-mer Counting on GPUs.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020
The Parallelism Motifs of Genomic Data Analysis.
CoRR, 2020

ADEPT: a domain independent sequence alignment strategy for gpu architectures.
BMC Bioinform., 2020

Timemory: Modular Performance Analysis for HPC.
Proceedings of the High Performance Computing - 35th International Conference, 2020

Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

GPU accelerated partial order multiple sequence alignment for long reads self-correction.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

2019
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers.
Int. J. High Perform. Comput. Appl., 2019

diBELLA: Distributed Long Read to Long Read Alignment.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Performance Analysis of GPU Programming Models Using the Roofline Scaling Trajectories.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2019

2018
A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization.
Proceedings of the High Performance Computing - 33rd International Conference, 2018

Extreme scale de novo metagenome assembly.
Proceedings of the International Conference for High Performance Computing, 2018

Roofline Scaling Trajectories: A Method for Parallel Application and Architectural Performance Analysis.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
High-End Computing for Next-Generation Scientific Discovery.
Parallel Comput., 2017

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers.
Parallel Comput., 2017

Communication-Avoiding Optimization Methods for Massive-Scale Graphical Model Structure Learning.
CoRR, 2017

Extreme-Scale De Novo Genome Assembly.
CoRR, 2017

Analyzing Performance of Selected NESAP Applications on the Cori HPC System.
Proceedings of the High Performance Computing, 2017

MerBench: PGAS Benchmarks for High Performance Genome Assembly.
Proceedings of PAW@SC 2017: Second Annual PGAS Applications Workshop, 2017

Performance analysis and optimization of the RAMPAGE metal alloy potential generation software.
Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems, 2017

Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016
LiRa: A New Likelihood-Based Similarity Score for Collaborative Filtering.
CoRR, 2016

Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor.
Proceedings of the High Performance Computing, 2016

Extreme scale plasma turbulence simulations on top supercomputers worldwide.
Proceedings of the International Conference for High Performance Computing, 2016


Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015
Special issue "Graph analysis for scientific discovery".
Parallel Comput., 2015

Parallel processing of filtered queries in attributed semantic graphs.
J. Parallel Distributed Comput., 2015

HipMer: an extreme-scale de novo genome assembler.
Proceedings of the International Conference for High Performance Computing, 2015

Thread-level parallelization and optimization of NWChem for the Intel MIC architecture.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

merAligner: A Fully Parallel Sequence Aligner.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Compiler-Directed Transformation for Higher-Order Stencils.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Parallel Performance Optimizations on Unstructured Mesh-based Simulations.
Proceedings of the International Conference on Computational Science, 2015

Efficient data reduction for large-scale genetic mapping.
Proceedings of the 6th ACM Conference on Bioinformatics, 2015

2014
Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly.
Proceedings of the International Conference for High Performance Computing, 2014

Efficient and accurate clustering for large-scale genetic mapping.
Proceedings of the 2014 IEEE International Conference on Bioinformatics and Biomedicine, 2014

2013
Best paper awards: 26th international parallel and distributed processing symposium (IPDPS 2012).
J. Parallel Distributed Comput., 2013

Introduction for Special Issue on Autotuning.
Int. J. High Perform. Comput. Appl., 2013

Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms.
Int. J. High Perform. Comput. Appl., 2013

Kinetic turbulence simulations at extreme scale on leadership-class systems.
Proceedings of the International Conference for High Performance Computing, 2013

Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

High-Productivity and High-Performance Analysis of Filtered Semantic Graphs.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Compiler generation and autotuning of communication-avoiding operators for geometric multigrid.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

2012
Optimization of Parallel Particle-to-Grid Interpolation on Leading Multicore Platforms.
IEEE Trans. Parallel Distributed Syst., 2012

Optimization of geometric multigrid for emerging multi- and manycore processors.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Poster: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms?
Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012

High-performance analysis of filtered semantic graphs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Green Flash: Climate Machine (LBNL).
Proceedings of the Encyclopedia of Parallel Computing, 2011

Emerging programming paradigms for large-scale scientific computing.
Parallel Comput., 2011

Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms.
Parallel Comput., 2011

Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning.
Proceedings of the Conference on High Performance Computing Networking, 2011

Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems.
Proceedings of the Conference on High Performance Computing Networking, 2011

Hardware/software co-design for energy-efficient seismic modeling.
Proceedings of the Conference on High Performance Computing Networking, 2011

Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Cosmic microwave background map-making at the petascale and beyond.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

2010
Communication Requirements and Interconnect Optimization for High-End Scientific Applications.
IEEE Trans. Parallel Distributed Syst., 2010

Parallel I/O performance: From events to ensembles.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

An auto-tuning framework for parallel multicore stencil computations.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Silicon Nanophotonic Network-on-Chip Using TDM Arbitration.
Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010

Sparse Matrix-Vector Multiplication on Multicore and Accelerators.
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

Auto-Tuning Stencil Computations on Multicore and Accelerators.
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

2009
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors.
SIAM Rev., 2009

Optimization of sparse matrix-vector multiplication on emerging multicore platforms.
Parallel Comput., 2009

HPC global file system performance analysis using a scientific-application derived benchmark.
Parallel Comput., 2009

Revolutionary technologies for acceleration of emerging petascale applications.
Parallel Comput., 2009

Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms.
J. Parallel Distributed Comput., 2009

Energy-Efficient Computing for Extreme-Scale Science.
Computer, 2009

A design methodology for domain-optimized power-efficient supercomputing.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Analysis of photonic networks for a chip multiprocessor using scientific applications.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009

Green flash: Designing an energy efficient climate supercomputer.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture.
Proceedings of the Architecture of Computing Systems, 2009

2008
Towards Ultra-High Resolution Models of Climate and Weather.
Int. J. High Perform. Comput. Appl., 2008

Scientific Application Performance On Leading Scalar and Vector Supercomputering Platforms.
Int. J. High Perform. Comput. Appl., 2008

Preface.
Int. J. High Perform. Comput. Appl., 2008

Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas.
IBM J. Res. Dev., 2008

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Lattice Boltzmann simulation optimization on leading multicore platforms.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007
Scientific Computing Kernels on the Cell Processor.
Int. J. Parallel Program., 2007

Investigation of leading HPC I/O performance using a scientific-application derived benchmark.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Scientific Application Performance on Candidate PetaScale Platforms.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Reconfigurable hybrid interconnection for static and dynamic scientific applications.
Proceedings of the 4th Conference on Computing Frontiers, 2007

2006
Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems.
Proceedings of the High Performance Computing for Computational Science, 2006

The potential of the cell processor for scientific computing.
Proceedings of the Third Conference on Computing Frontiers, 2006

Performance characteristics of an adaptive mesh refinement calculation on scalar and vector platforms.
Proceedings of the Third Conference on Computing Frontiers, 2006

Implicit and explicit optimizations for stencil computations.
Proceedings of the 2006 workshop on Memory System Performance and Correctness, 2006

Performance Evaluation and Modeling of Ultra-Scale Systems.
Proceedings of the Parallel Processing for Scientific Computing, 2006

2005
Performance evaluation of the SX-6 vector architecture for scientific computations.
Concurr. Pract. Exp., 2005

Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Leading Computational Methods on Scalar and Vector HEC Platforms.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Integrated Performance Monitoring of a Cosmology Application on Leading HEC Platforms.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Impact of modern memory subsystems on cache optimizations for stencil computations.
Proceedings of the 2005 workshop on Memory System Performance, 2005

2004
A Performance Evaluation of the Cray X1 for Scientific Applications.
Proceedings of the High Performance Computing for Computational Science, 2004

Scientific Computations on Modern Parallel Vector Systems.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Identifying Performance Bottlenecks on Modern Microarchitectures Using an Adaptable Probe.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Performance Characteristics of a Cosmology Package on Leading HPC Architectures.
Proceedings of the High Performance Computing, 2004

2003
Message passing and shared address space parallelism on an SMP cluster.
Parallel Comput., 2003

Job Superscheduler Architecture and Performance in Computational Grid Environments.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Performance Evaluation of Two Emerging Media Processors: VIRAM and Imagine.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

2002
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations.
SIAM Rev., 2002

A Comparison of Three Programming Models for Adaptive Applications on the Origin2000.
J. Parallel Distributed Comput., 2002

Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

2001
Design Strategies for Irregularly Adapting Parallel Applications.
Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001

Ordering Sparse Matrices for Cache-Based Systems.
Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001

Message Passing Vs. Shared Address Space on a Clusters of SMPs.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

2000
Parallelization of a Dynamic Unstructured Algorithm Using Three Leading Programming Paradigms.
IEEE Trans. Parallel Distributed Syst., 2000

Parallel tetrahedral mesh adaptation with dynamic load balancing.
Parallel Comput., 2000

ESP: A System Utilization Benchmark.
Proceedings of the Proceedings Supercomputing 2000, 2000

System Utilization Benchmark on the Cray T3E and IBM SP.
Proceedings of the Job Scheduling Strategies for Parallel Processing, IPDPS 2000 Workshop, 2000

Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems.
Proceedings of the Parallel and Distributed Processing, 2000

1999
Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Dynamic Load Balancing for Parallel Adaptive Unstructured Grid Computations.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

1998
PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes.
J. Parallel Distributed Comput., 1998

Performance Analysis and Portability of the PLUM Load Balancing System.
Proceedings of the Euro-Par '98 Parallel Processing, 1998

1997
Efficient Load Balancing and Data Remapping for Adaptive Grid Calculations.
Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, 1997

Load Balancing Unstructured Adaptive Grids for CFD Problems.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

Load balancing sequences of unstructured adaptive grids.
Proceedings of the Fourth International on High-Performance Computing, 1997

1996
Algorithms for Automatic Alignment of Arrays.
J. Parallel Distributed Comput., 1996

Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems.
Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996

Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2.
Proceedings of the Parallel Algorithms for Irregularly Structured Problems, 1996


  Loading...