Katherine A. Yelick

According to our database1, Katherine A. Yelick authored at least 139 papers between 1985 and 2018.

Collaborative distances :

Awards

ACM Fellow

ACM Fellow 2012, "For contributions to parallel languages that improve programmer productivity.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2018
CHIUW 2018 Keynote.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Indigo: A Domain-Specific Language for Fast, Portable Image Reconstruction.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
Communication-Avoiding Optimization Methods for Massive-Scale Graphical Model Structure Learning.
CoRR, 2017

Advanced Cyberinfrastructure for Science, Engineering, and Public Policy.
CoRR, 2017

Extreme-Scale De Novo Genome Assembly.
CoRR, 2017

MerBench: PGAS Benchmarks for High Performance Genome Assembly.
Proceedings of PAW@SC 2017: Second Annual PGAS Applications Workshop, 2017

Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016
An Asynchronous Task-based Fan-Both Sparse Cholesky Solver.
CoRR, 2016

Accelerating Science: A Computing Research Agenda.
CoRR, 2016

21st Century Computer Architecture.
CoRR, 2016

A Hartree-Fock Application Using UPC++ and the New DArray Library.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015
HipMer: an extreme-scale de novo genome assembler.
Proceedings of the International Conference for High Performance Computing, 2015

merAligner: A Fully Parallel Sequence Aligner.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Parallel Hessian Assembly for Seismic Waveform Inversion Using Global Updates.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

The Endgame for Moore's Law: Architecture, Algorithm, and Application Challenges.
Proceedings of the Federated Computing Research Conference, 2015

2014
A Computation- and Communication-Optimal Parallel Direct 3-Body Algorithm.
Proceedings of the International Conference for High Performance Computing, 2014

Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly.
Proceedings of the International Conference for High Performance Computing, 2014

A Local-View Array Library for Partitioned Global Address Space C++ Programs.
Proceedings of the ARRAY'14: Proceedings of the 2014 ACM SIGPLAN International Workshop on Libraries, 2014

Evaluation of PGAS Communication Paradigms with Geometric Multigrid.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

UPC++: A PGAS Extension for C++.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

On the conditions for efficient interoperability with threads: an experience with PGAS languages using cray communication domains.
Proceedings of the 2014 International Conference on Supercomputing, 2014

2013
Best paper awards: 26th international parallel and distributed processing symposium (IPDPS 2012).
J. Parallel Distrib. Comput., 2013

Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1.
CoRR, 2013

Hierarchical Computation in the SPMD Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

A Communication-Optimal N-Body Algorithm for Direct Interactions.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012
Optimization of Parallel Particle-to-Grid Interpolation on Leading Multicore Platforms.
IEEE Trans. Parallel Distrib. Syst., 2012

A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI.
SIGMETRICS Performance Evaluation Review, 2012

Communication avoiding and overlapping for numerical linear algebra.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Keynote address: Moving a science workload to exascale computing.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Compiling to avoid communication.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Titanium.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Tuning collective communication for Partitioned Global Address Space programming models.
Parallel Computing, 2011

Yada: Straightforward parallel programming.
Parallel Computing, 2011

The International Exascale Software Project roadmap.
IJHPCA, 2011

Exascale opportunities and challenges.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

2010
Hybrid PGAS runtime support for multicore nodes.
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, 2010

2009
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors.
SIAM Review, 2009

Optimization of sparse matrix-vector multiplication on emerging multicore platforms.
Parallel Computing, 2009

Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms.
J. Parallel Distrib. Comput., 2009

Technical perspective - Abstraction for parallelism.
Commun. ACM, 2009

A view of the parallel computing landscape.
Commun. ACM, 2009

Minimizing communication in sparse matrix solvers.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Enforcing Textual Alignment of Collectives Using Dynamic Checks.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Ten ways to waste a parallel computer.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Scheduling dynamic parallelism on accelerators.
Proceedings of the 6th Conference on Computing Frontiers, 2009

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture.
Proceedings of the Architecture of Computing Systems, 2009

2008
DARPA's HPCS Program- History, Models, Tools, Languages.
Advances in Computers, 2008

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Programming models for petascale to exascale.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Lattice Boltzmann simulation optimization on leading multicore platforms.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Avoiding communication in sparse matrix computations.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Performance portable optimizations for loops containing communication operations.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

2007
Languages for High-Productivity Computing: the DARPA HPCS Language Project.
Parallel Processing Letters, 2007

Scientific Computing Kernels on the Cell Processor.
International Journal of Parallel Programming, 2007

Parallel Languages and Compilers: Perspective From the Titanium Experience.
IJHPCA, 2007

When cache blocking of sparse matrix vector multiply works and why.
Appl. Algebra Eng. Commun. Comput., 2007

Deadlock-free scheduling of X10 computations with bounded resources.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Optimization of sparse matrix-vector multiplication on emerging multicore platforms.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

An adaptive mesh refinement benchmark for modern parallel programming languages.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Multi-threading and one-sided communication in parallel LU factorization.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Hierarchical Pointer Analysis for Distributed Programs.
Proceedings of the Static Analysis, 14th International Symposium, 2007

Automatic Communication Performance Debugging in PGAS Languages.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

Productivity and performance using partitioned global address space languages.
Proceedings of the Parallel Symbolic Computation, 2007

Automatic nonblocking communication for partitioned global address space programs.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Performance Portable Optimizations for Loops Containing Communication Operations.
Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007

2006
Distributed Immersed Boundary Simulation in Titanium.
SIAM J. Scientific Computing, 2006

Particles and contiuum - Performance modeling and optimization of a high energy colliding beam simulation code.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Poster reception - Optimized collectives for PGAS languages with one-sided communication.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Performance Advantages of Partitioned Global Address Space Languages.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Compilation Techniques for Partitioned Global Address Space Languages.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

Optimizing bandwidth limited problems using one-sided communication and overlap.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Performance Analysis of a High Energy Colliding Beam Simulation Code on Four HPC Architectures.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

The potential of the cell processor for scientific computing.
Proceedings of the Third Conference on Computing Frontiers, 2006

Implicit and explicit optimizations for stencil computations.
Proceedings of the 2006 workshop on Memory System Performance and Correctness, 2006

2005
Self-Adapting Linear Algebra Algorithms and Software.
Proceedings of the IEEE, 2005

Making Sequential Consistency Practical in Titanium.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Language innovations for HPCS.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Concurrency Analysis for Parallel Programs with Textually Aligned Barriers.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Titanium Performance and Potential: An NPB Experimental Study.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Automatic Support for Irregular Computations in a High-Level Language.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Communication Optimizations for Fine-Grained UPC Applications.
Proceedings of the 14th International Conference on Parallel Architecture and Compilation Techniques (PACT 2005), 2005

Impact of modern memory subsystems on cache optimizations for stencil computations.
Proceedings of the 2005 workshop on Memory System Performance, 2005

2004
Special Issue on Automatic Performance Tuning.
IJHPCA, 2004

Sparsity: Optimization Framework for Sparse Matrix Kernels.
IJHPCA, 2004

Performance Tuning of Matrix Triple Products Based on Matrix Structure.
Proceedings of the Applied Parallel Computing, 2004

Array Prefetching for Irregular Array Accesses in Titanium.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Identifying Performance Bottlenecks on Modern Microarchitectures Using an Adaptable Probe.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Evaluating support for global address space languages on the Cray X1.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

2003
Type Systems for Distributed Data Sharing.
Proceedings of the Static Analysis, 10th International Symposium, 2003

Polynomial-Time Algorithms for Enforcing Sequential Consistency in SPMD Programs with Arrays.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

An Evaluation of Current High-Performance Networks.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

A performance analysis of the Berkeley UPC compiler.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

Memory Hierarchy Optimizations and Performance ounds for Sparse A.
Proceedings of the Computational Science - ICCS 2003, 2003

2002
ROC-1: Hardware Support for Recovery-Oriented Computing.
IEEE Trans. Computers, 2002

Performance optimizations and bounds for sparse matrix-vector multiply.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

2001
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY.
Proceedings of the Computational Science - ICCS 2001, 2001

2000
Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

Performance Analysis of an H.263 Video Encoder for VIRAM.
Proceedings of the 2000 International Conference on Image Processing, 2000

1999
Titanium: A High Performance Java Dialect.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Optimizing Sparse Matrix Vector Multiplication on SMP.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999


1998
Titanium: A High-performance Java Dialect.
Concurrency - Practice and Experience, 1998

1997
A case for intelligent RAM.
IEEE Micro, 1997

Models and Scheduling Algorithms for Mixed Data and Task Parallel Programs.
J. Parallel Distrib. Comput., 1997

Scalable Processors in the Billion-Transistor Era: IRAM.
IEEE Computer, 1997

The Energy Efficiency of IRAM Architectures.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

Intelligent RAM (IRAM): The Industrial Setting, Applications and Architectures.
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

1996
Analyses and Optimizations for Shared Address Space Programs.
J. Parallel Distrib. Comput., 1996

Systems Support for Irregular Parallel Applications (Abstract).
Proceedings of the Parallel Algorithms for Irregularly Structured Problems, 1996

Performance Modeling and Composition: A Case Study in Cell Simulation.
Proceedings of IPPS '96, 1996

Evaluation of Architectural Support for Global Address-Based Communication in Large-Scale Parallel Machines.
Proceedings of the ASPLOS-VII Proceedings, 1996

1995
Modeling the Benefits of Mixed Data and Task Parallelism.
SPAA, 1995

Parallelizing the Phylogeny Problem.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Portable Parallel Irregular Applications.
Proceedings of the Parallel Symbolic Languages and Systems, 1995

Optimizing Parallel Programs with Explicit Synchronization.
Proceedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation (PLDI), 1995

Empirical Evaluation of the CRAY-T3D: A Compiler Perspective.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Portable Runtime Support for Asynchronous Simulation.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

1994
Distributed Data Structures and Algorithms for Gröbner Basis Computation.
Lisp and Symbolic Computation, 1994

Optimizing Parallel SPMD Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 1994

Connected components on distributed memory machines.
Proceedings of the Parallel Algorithms, 1994

1993

Parallel programming in Split-C.
Proceedings of the Proceedings Supercomputing '93, 1993

On the Correctness of a Distributed Memory Gröbner basis Algorithm.
Proceedings of the Rewriting Techniques and Applications, 5th International Conference, 1993

Implementing an Irregular Application on a Distributed Memory Multiprocessor.
Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1993

Parallel timing simulation on a distributed memory multiprocessor.
Proceedings of the 1993 IEEE/ACM International Conference on Computer-Aided Design, 1993

1992
Programming Models for Irregular Applications.
SIGPLAN Workshop, 1992

A Parallel Completion Procedure for Term Rewriting Systems.
Proceedings of the Automated Deduction, 1992

Using Moded Type Systems to Support Abstraction in Logic Programs.
Types in Logic Programming, 1992

1990
Parallel Completion.
Proceedings of the Parallelization in Inference Systems, 1990

1989
Moded Type Systems for Logic Programming.
Proceedings of the Conference Record of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, 1989

1987
Unification in Combinations of Collapse-Free Regular Theories.
J. Symb. Comput., 1987

1985
Combining Unification Algorithms for Confined Regular Equational Theories.
Proceedings of the Rewriting Techniques and Applications, First International Conference, 1985


  Loading...