Katherine A. Yelick

CoRR, 2016

21st Century Computer Architecture.

[BibT_eX]

[DOI]

CoRR, 2016

A Hartree-Fock Application Using UPC++ and the New DArray Library.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015

HipMer: an extreme-scale de novo genome assembler.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

merAligner: A Fully Parallel Sequence Aligner.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Parallel Hessian Assembly for Seismic Waveform Inversion Using Global Updates.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

The Endgame for Moore's Law: Architecture, Algorithm, and Application Challenges.

[BibT_eX]

[DOI]

Kathy Yelick

Proceedings of the Federated Computing Research Conference, 2015

2014

A Computation- and Communication-Optimal Parallel Direct 3-Body Algorithm.

[BibT_eX]

[DOI]

Penporn Koanantakool

Proceedings of the International Conference for High Performance Computing, 2014

Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

A Local-View Array Library for Partitioned Global Address Space C++ Programs.

[BibT_eX]

[DOI]

Yili Zheng

Proceedings of the ARRAY'14: Proceedings of the 2014 ACM SIGPLAN International Workshop on Libraries, 2014

Evaluation of PGAS Communication Paradigms with Geometric Multigrid.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

UPC++: A PGAS Extension for C++.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

On the conditions for efficient interoperability with threads: an experience with PGAS languages using cray communication domains.

[BibT_eX]

[DOI]

Khaled Z. Ibrahim

Proceedings of the 2014 International Conference on Supercomputing, 2014

2013

Best paper awards: 26th international parallel and distributed processing symposium (IPDPS 2012).

[BibT_eX]

[DOI]

Leonid Oliker

J. Parallel Distributed Comput., 2013

Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1.

[BibT_eX]

[DOI]

CoRR, 2013

Hierarchical Computation in the SPMD Programming Model.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2013

A Communication-Optimal N-Body Algorithm for Direct Interactions.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012

Optimization of Parallel Particle-to-Grid Interpolation on Leading Multicore Platforms.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2012

A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI.

[BibT_eX]

[DOI]

SIGMETRICS Perform. Evaluation Rev., 2012

Communication avoiding and overlapping for numerical linear algebra.

[BibT_eX]

[DOI]

Evangelos Georganas

Jorge González-Domínguez

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Keynote address: Moving a science workload to exascale computing.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Compiling to avoid communication.

[BibT_eX]

[DOI]

Kathy Yelick

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Titanium.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

Tuning collective communication for Partitioned Global Address Space programming models.

[BibT_eX]

[DOI]

Parallel Comput., 2011

Yada: Straightforward parallel programming.

[BibT_eX]

[DOI]

Parallel Comput., 2011

The International Exascale Software Project roadmap.

[BibT_eX]

[DOI]

Bertrand Braunschweig

Int. J. High Perform. Comput. Appl., 2011

Exascale opportunities and challenges.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

2010

Hybrid PGAS runtime support for multicore nodes.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, 2010

Auto-Tuning Stencil Computations on Multicore and Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

2009

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors.

[BibT_eX]

[DOI]

SIAM Rev., 2009

Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2009

Technical perspective - Abstraction for parallelism.

[BibT_eX]

[DOI]

Commun. ACM, 2009

A view of the parallel computing landscape.

[BibT_eX]

[DOI]

Commun. ACM, 2009

Minimizing communication in sparse matrix solvers.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Enforcing Textual Alignment of Collectives Using Dynamic Checks.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2009

Ten ways to waste a parallel computer.

[BibT_eX]

[DOI]

Dimitrios S. Nikolopoulos

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Scheduling dynamic parallelism on accelerators.

[BibT_eX]

[DOI]

Benjamin Rose

Proceedings of the 6th Conference on Computing Frontiers, 2009

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems, 2009

2008

DARPA's HPCS Program- History, Models, Tools, Languages.

[BibT_eX]

Adv. Comput., 2008

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Programming models for petascale to exascale.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Lattice Boltzmann simulation optimization on leading multicore platforms.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Avoiding communication in sparse matrix computations.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Performance portable optimizations for loops containing communication operations.

[BibT_eX]

[DOI]

Costin Iancu

Wei Chen

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

2007

Languages for High-Productivity Computing: the DARPA HPCS Language Project.

[BibT_eX]

[DOI]

Ewing L. Lusk

Parallel Process. Lett., 2007

Scientific Computing Kernels on the Cell Processor.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2007

Parallel Languages and Compilers: Perspective From the Titanium Experience.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2007

When cache blocking of sparse matrix vector multiply works and why.

[BibT_eX]

[DOI]

Appl. Algebra Eng. Commun. Comput., 2007

Deadlock-free scheduling of X10 computations with bounded resources.

[BibT_eX]

[DOI]

Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Optimization of sparse matrix-vector multiplication on emerging multicore platforms.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

An adaptive mesh refinement benchmark for modern parallel programming languages.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Multi-threading and one-sided communication in parallel LU factorization.

[BibT_eX]

[DOI]

Parry Husbands

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Hierarchical Pointer Analysis for Distributed Programs.

[BibT_eX]

[DOI]

Proceedings of the Static Analysis, 14th International Symposium, 2007

Automatic Communication Performance Debugging in PGAS Languages.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2007

Productivity and performance using partitioned global address space languages.

[BibT_eX]

[DOI]

Proceedings of the Parallel Symbolic Computation, 2007

Automatic nonblocking communication for partitioned global address space programs.

[BibT_eX]

[DOI]

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

2006

Distributed Immersed Boundary Simulation in Titanium.

[BibT_eX]

[DOI]

Edward Givelberg

SIAM J. Sci. Comput., 2006

Particles and contiuum - Performance modeling and optimization of a high energy colliding beam simulation code.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Poster reception - Optimized collectives for PGAS languages with one-sided communication.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Performance Advantages of Partitioned Global Address Space Languages.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Compilation Techniques for Partitioned Global Address Space Languages.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2006

Optimizing bandwidth limited problems using one-sided communication and overlap.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Performance Analysis of a High Energy Colliding Beam Simulation Code on Four HPC Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

The potential of the cell processor for scientific computing.

[BibT_eX]

[DOI]

Proceedings of the Third Conference on Computing Frontiers, 2006

Implicit and explicit optimizations for stencil computations.

[BibT_eX]

[DOI]

Proceedings of the 2006 workshop on Memory System Performance and Correctness, 2006

2005

Self-Adapting Linear Algebra Algorithms and Software.

[BibT_eX]

[DOI]

Proc. IEEE, 2005

Making Sequential Consistency Practical in Titanium.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Language innovations for HPCS.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Concurrency Analysis for Parallel Programs with Textually Aligned Barriers.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2005

Titanium Performance and Potential: An NPB Experimental Study.

[BibT_eX]

[DOI]

Kaushik Datta

Dan Bonachea

Proceedings of the Languages and Compilers for Parallel Computing, 2005

Automatic Support for Irregular Computations in a High-Level Language.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Communication Optimizations for Fine-Grained UPC Applications.

[BibT_eX]

[DOI]

Wei-Yu Chen

Costin Iancu

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

Impact of modern memory subsystems on cache optimizations for stencil computations.

[BibT_eX]

[DOI]

Proceedings of the 2005 workshop on Memory System Performance, 2005

2004

Special Issue on Automatic Performance Tuning.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2004

Sparsity: Optimization Framework for Sparse Matrix Kernels.

[BibT_eX]

[DOI]

Eun-Jin Im

Richard W. Vuduc

Int. J. High Perform. Comput. Appl., 2004

Performance Tuning of Matrix Triple Products Based on Matrix Structure.

[BibT_eX]

[DOI]

Proceedings of the Applied Parallel Computing, 2004

Array Prefetching for Irregular Array Accesses in Titanium.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Identifying Performance Bottlenecks on Modern Microarchitectures Using an Adaptable Probe.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Evaluating support for global address space languages on the Cray X1.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

2003

Type Systems for Distributed Data Sharing.

[BibT_eX]

[DOI]

Ben Liblit

Alex Aiken

Proceedings of the Static Analysis, 10th International Symposium, 2003

Polynomial-Time Algorithms for Enforcing Sequential Consistency in SPMD Programs with Arrays.

[BibT_eX]

[DOI]

Wei-Yu Chen

Proceedings of the Languages and Compilers for Parallel Computing, 2003

An Evaluation of Current High-Performance Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

A performance analysis of the Berkeley UPC compiler.

[BibT_eX]

[DOI]

Parry Husbands

Costin Iancu

Proceedings of the 17th Annual International Conference on Supercomputing, 2003

Memory Hierarchy Optimizations and Performance ounds for Sparse A.

[BibT_eX]

[DOI]

Proceedings of the Computational Science - ICCS 2003, 2003

2002

ROC-1: Hardware Support for Recovery-Oriented Computing.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2002

Performance optimizations and bounds for sparse matrix-vector multiply.

[BibT_eX]

[DOI]

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines.

[BibT_eX]

[DOI]

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

2001

Hardware/compiler codevelopment for an embedded media processor.

[BibT_eX]

[DOI]

Proc. IEEE, 2001

Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY.

[BibT_eX]

[DOI]

Eun-Jin Im

Proceedings of the Computational Science - ICCS 2001, 2001

2000

Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler.

[BibT_eX]

[DOI]

David Judd

David R. Martin

David A. Patterson

Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

Performance Analysis of an H.263 Video Encoder for VIRAM.

[BibT_eX]

[DOI]

Thinh P. Q. Nguyen

Avideh Zakhor

Proceedings of the 2000 International Conference on Image Processing, 2000

1999

Titanium: A High Performance Java Dialect.

[BibT_eX]

Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Optimizing Sparse Matrix Vector Multiplication on SMP.

[BibT_eX]

Eun-Jin Im

Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Cluster I/O with River: Making the Fast Case Common.

[BibT_eX]

[DOI]

Remzi H. Arpaci-Dusseau

Eric Anderson

Noah Treuhaft

David E. Culler

Joseph M. Hellerstein

David A. Patterson

Proceedings of the Sixth Workshop on I/O in Parallel and Distributed Systems, 1999

1997

A case for intelligent RAM.

[BibT_eX]

[DOI]

Randi Thomas

IEEE Micro, 1997

Models and Scheduling Algorithms for Mixed Data and Task Parallel Programs.

[BibT_eX]

[DOI]

James Demmel

J. Parallel Distributed Comput., 1997

Scalable Processors in the Billion-Transistor Era: IRAM.

[BibT_eX]

[DOI]

Computer, 1997

The Energy Efficiency of IRAM Architectures.

[BibT_eX]

[DOI]

Richard Fromm

Stylianos Perissakis

Neal Cardwell

Proceedings of the 24th International Symposium on Computer Architecture, 1997

Intelligent RAM (IRAM): The Industrial Setting, Applications and Architectures.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

1996

Analyses and Optimizations for Shared Address Space Programs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1996

Systems Support for Irregular Parallel Applications (Abstract).

[BibT_eX]

[DOI]

Proceedings of the Parallel Algorithms for Irregularly Structured Problems, 1996

Performance Modeling and Composition: A Case Study in Cell Simulation.

[BibT_eX]

[DOI]

Steve G. Steinberg

Jun Yang

Proceedings of IPPS '96, 1996

Evaluation of Architectural Support for Global Address-Based Communication in Large-Scale Parallel Machines.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-VII Proceedings, 1996

1995

Modeling the Benefits of Mixed Data and Task Parallelism.

[BibT_eX]

[DOI]

James Demmel

Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures, 1995

Parallelizing the Phylogeny Problem.

[BibT_eX]

[DOI]

Jeff A. Jones

Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Portable Parallel Irregular Applications.

[BibT_eX]

[DOI]

Proceedings of the Parallel Symbolic Languages and Systems, 1995

Optimizing Parallel Programs with Explicit Synchronization.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation (PLDI), 1995

Runtime Support for Portable Distributed Data Structures.

[BibT_eX]

[DOI]

Proceedings of the Languages, 1995

Empirical Evaluation of the CRAY-T3D: A Compiler Perspective.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Portable Runtime Support for Asynchronous Simulation.

[BibT_eX]

Chih-Po Wen

Proceedings of the 1995 International Conference on Parallel Processing, 1995

1994

Distributed Data Structures and Algorithms for Gröbner Basis Computation.

[BibT_eX]

LISP Symb. Comput., 1994

Optimizing Parallel SPMD Programs.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1994

Connected components on distributed memory machines.

[BibT_eX]

[DOI]

Proceedings of the Parallel Algorithms, 1994

1993

Common runtime support for high-performance parallel languages.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '93, 1993

Parallel programming in Split-C.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '93, 1993

On the Correctness of a Distributed Memory Gröbner basis Algorithm.

[BibT_eX]

[DOI]

Proceedings of the Rewriting Techniques and Applications, 5th International Conference, 1993

Implementing an Irregular Application on a Distributed Memory Multiprocessor.

[BibT_eX]

[DOI]

Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1993

Parallel timing simulation on a distributed memory multiprocessor.

[BibT_eX]

[DOI]

Chih-Po Wen

Proceedings of the 1993 IEEE/ACM International Conference on Computer-Aided Design, 1993

1992

Programming Models for Irregular Applications.

[BibT_eX]

[DOI]

Proceedings of the 2nd SIGPLAN Workshop on Languages, Compilers, and Run-Time Environments for Distributed Memory Multiprocessors, Boulder, Colorado, September 30, 1992

A Parallel Completion Procedure for Term Rewriting Systems.

[BibT_eX]

[DOI]

Stephen J. Garland

Proceedings of the Automated Deduction, 1992

Using Moded Type Systems to Support Abstraction in Logic Programs.

[BibT_eX]

Joseph L. Zachary