Keshav Pingali

According to our database1, Keshav Pingali authored at least 171 papers between 1985 and 2018.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2010, "For contributions to compilers and parallel computing".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2018
Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

A Lightweight Communication Runtime for Distributed Graph Analytics.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Unlocking fine-grain parallelism for AIG rewriting.
Proceedings of the International Conference on Computer-Aided Design, 2018

Abelian: A Compiler for Graph Analytics on Distributed, Heterogeneous Platforms.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

2017
IGA-ADS: Isogeometric analysis FEM using ADS solver.
Computer Physics Communications, 2017

Dynamic Load Balancing Strategies for Graph Applications on GPUs.
CoRR, 2017

An Elementary Introduction to Kalman Filtering.
CoRR, 2017

Capri: A Control System for Approximate Programs.
CoRR, 2017

Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Parallel triangle counting and k-truss identification using graph-centric methods.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

What Scalable Programs Need from Transactional Memory.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
Adaptive Work-Efficient Connected Components on the GPU.
CoRR, 2016

Lowering IrGL to CUDA.
CoRR, 2016

Parallel graph analytics.
Commun. ACM, 2016

DSMR: a shared and distributed memory algorithm for single-source shortest path problem.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

A compiler for throughput optimization of graph algorithms on GPUs.
Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, 2016

Synchronization Trade-Offs in GPU Implementations of Graph Algorithms.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Hypergraph Grammars in Non-stationary hp-adaptive Finite Element Method.
Proceedings of the International Conference on Computational Science 2016, 2016

Hybrid Direct and Iterative Solver with Library of Multi-criteria Optimal Orderings for h Adaptive Finite Element Method Computations.
Proceedings of the International Conference on Computational Science 2016, 2016

Proactive Control of Approximate Programs.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015
Introduction to the Special Issue on PPoPP'12.
TOPC, 2015

Quasi-Optimal Elimination Trees for 2D Grids with Singularities.
Scientific Programming, 2015

Scaling Runtimes for Irregular Algorithms to Large-Scale NUMA Systems.
IEEE Computer, 2015

Parallel program = operator + schedule + parallel data structure.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

Stochastic gradient descent on GPUs.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Synthesizing parallel graph programs via automated planning.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Automatic Tuning of Task Scheduling Policies on Multicore Architectures.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

Telescopic Hybrid Fast Solver for 3D Elliptic Problems with Point Singularities.
Proceedings of the International Conference on Computational Science, 2015

Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons Learned.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Priority Queues Are Not Good Concurrent Priority Schedulers.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

A Graphical Model for Context-Free Grammar Parsing.
Proceedings of the Compiler Construction - 24th International Conference, 2015

Kinetic Dependence Graphs.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Brief announcement: parallelization of asynchronous variational integrators forshared memory architectures.
Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014

Parallelization of Reordering Algorithms for Bandwidth and Wavefront Reduction.
Proceedings of the International Conference for High Performance Computing, 2014

High-speed graph analytics with the galois system.
Proceedings of the first workshop on Parallel programming for analytics applications, 2014

Author retrospective for synthesizing transformations for locality enhancement of imperfectly-nested loop nests.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Graph Grammar based Multi-thread Multi-frontal Direct Solver with Galois Scheduler.
Proceedings of the International Conference on Computational Science, 2014

Deterministic galois: on-demand, portable and parameterless.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Adaptive heterogeneous scheduling for integrated GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
A lightweight infrastructure for graph analytics.
Proceedings of the ACM SIGOPS 24th Symposium on Operating Systems Principles, 2013

Betweenness centrality: algorithms and implementations.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Morph algorithms on GPUs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Data-Driven Versus Topology-driven Irregular Computations on GPUs.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Atomic-free irregular computations on GPUs.
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

2012
Processor Allocation for Optimistic Parallelization of Irregular Programs
CoRR, 2012

Parallelizing SuperFine.
Proceedings of the ACM Symposium on Applied Computing, 2012

A GPU implementation of inclusion-based points-to analysis.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Elixir: a system for synthesizing concurrent graph programs.
Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2012

Parallel Clustered Low-Rank Approximation of Graphs and Its Application to Link Prediction.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

A quantitative study of irregular programs on GPUs.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Processor Allocation for Optimistic Parallelization of Irregular Programs.
Proceedings of the Computational Science and Its Applications - ICCSA 2012, 2012

2011
Locality of Reference and Parallel Processing.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Brief announcement: processor allocation for optimistic parallelization of irregular programs.
Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011

Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

A shape analysis for optimizing parallel graph programs.
Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011

Parallelizing irregular algorithms: a pattern language.
Proceedings of the 18th Conference on Pattern Languages of Programs, 2011

The tao of parallelism in algorithms.
Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011

Exploiting the commutativity lattice.
Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011

Synthesizing concurrent schedulers for irregular algorithms.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

2010
La prossima vita at TOPLAS.
ACM Trans. Program. Lang. Syst., 2010

La dolce vita at TOPLAS.
ACM Trans. Program. Lang. Syst., 2010

Programming Multicores: Do Applications Programmers Need to Write Explicitly Parallel Programs?
IEEE Micro, 2010

Structure-driven optimizations for amorphous data-parallel programs.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Parallel inclusion-based points-to analysis.
Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2010

Parallel Graph Partitioning on Multicore Architectures.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Towards a science of parallel programming.
Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques, 2010

Ordered and unordered algorithms for parallel breadth first search.
Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques, 2010

2009
Remembrances of things past.
ACM Trans. Program. Lang. Syst., 2009

Optimistic parallelism requires abstractions.
Commun. ACM, 2009

Compiler research: the next 50 years.
Commun. ACM, 2009

How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Lonestar: A suite of parallel irregular programs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Compiler-enhanced incremental checkpointing for OpenMP applications.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

09191 Abstracts Collection - Fault Tolerance in High-Performance Computing and Grids.
Proceedings of the Fault Tolerance in High-Performance Computing and Grids, 03.05., 2009

2008
Parallel and Vector Programming Languages.
Proceedings of the Wiley Encyclopedia of Computer Science and Engineering, 2008

An Experimental Study of Self-Optimizing Dense Linear Algebra Software.
Proceedings of the IEEE, 2008

Scheduling strategies for optimistic parallel execution of irregular programs.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Compiler-enhanced incremental checkpointing for OpenMP applications.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

On the Scalability of an Automatically Parallelized Irregular Application.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Data-parallel abstractions for irregular programs.
Proceedings of the 5th Conference on Computing Frontiers, 2008

Optimistic parallelism benefits from data partitioning.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

2007
Editorial: A changing of the guard.
ACM Trans. Program. Lang. Syst., 2007

An experimental comparison of cache-oblivious and cache-conscious programs.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Optimistic parallelism requires abstractions.
Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007

Compiler-Enhanced Incremental Checkpointing.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

Scheduling Issues in Optimistic Parallelization.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

2006
Mobile MPI programs in computational grids.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006

Is Cache-Oblivious DGEMM Viable?
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Recent advances in checkpoint/recovery systems.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

A distributed system based on web services for computational science simulations.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Experimental evaluation of application-level checkpointing for OpenMP programs.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

2005
Is Search Really Necessary to Generate High-Performance BLAS?
Proceedings of the IEEE, 2005

Automatic measurement of memory hierarchy parameters.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2005

Automatic Measurement of Instruction Cache Capacity.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

A Language for the Compact Representation of Multiple Program Versions.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Optimizing Checkpoint Sizes in the C3 System.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Think globally, search locally.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

2004
A Load Balancing Framework for Adaptive and Asynchronous Applications.
IEEE Trans. Parallel Distrib. Syst., 2004

Look Left, Look Right, Look Left Again: An Application of Fractal Symbolic Analysis to Linear Algebra Code Restructuring.
International Journal of Parallel Programming, 2004

Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

O'SOAP - A Web Services Framework for DDDAS Applications.
Proceedings of the Computational Science, 2004

Application-level checkpointing for shared memory programs.
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

2003
Fractal symbolic analysis.
ACM Trans. Program. Lang. Syst., 2003

Algorithms for computing the static single assignment form.
J. ACM, 2003

Automated application-level checkpointing of MPI programs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

A comparison of empirical and model-driven optimization.
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

C3: A System for Automating Application-Level Checkpointing of MPI Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Collective operations in application-level fault-tolerant MPI.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003


2002
Date movement and control substrate for parallel adaptive applications.
Concurrency and Computation: Practice and Experience, 2002

Next Generation System Software for Future High-End Computing Systems.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

2001
Data-Centric Transformations for Locality Enhancement.
International Journal of Parallel Programming, 2001

Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests.
International Journal of Parallel Programming, 2001

Fractal symbolic analysis.
Proceedings of the 15th international conference on Supercomputing, 2001

Topic 04: Compilers for High Performance.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

Solving Alignment Using Elementary Linear Algebra.
Proceedings of the Compiler Optimizations for Scalable Parallel Systems Languages, 2001

2000
Fractal Symbolic Analysis
CoRR, 2000

Landing CG on EARTH: A Case Study of Fine-Grained Multithreading on an Evolutionary Path.
Proceedings of the Proceedings Supercomputing 2000, 2000

A Framework for Sparse Matrix Code Synthesis from High-level Specifications.
Proceedings of the Proceedings Supercomputing 2000, 2000

Tiling Imperfectly-Nested Loop Nests.
Proceedings of the Proceedings Supercomputing 2000, 2000

Parallel FEM Simulation of Crack Propagation - Challenges, Status, and Perspectives.
Proceedings of the Parallel and Distributed Processing, 2000

Next-generation generic programming and its application to sparse matrix computations.
Proceedings of the 14th international conference on Supercomputing, 2000

Synthesizing transformations for locality enhancement of imperfectly-nested loop nests.
Proceedings of the 14th international conference on Supercomputing, 2000

Left-Looking to Right-Looking and Vice Versa: An Application of Fractal Symbolic Analysis to Linear Algebra Code Restructuring.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

Automatic Generation of Block-Recursive Codes.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999
High-level semantic optimization of numerical codes.
Proceedings of the 13th international conference on Supercomputing, 1999

An experimental evaluation of tiling and shackling for memory hierarchy management.
Proceedings of the 13th international conference on Supercomputing, 1999

A case for source-level transformations in MATLAB.
Proceedings of the Second Conference on Domain-Specific Languages (DSL '99), 1999

1997
Optimal Control Dependence Computation and the Roman Chariots Problem.
ACM Trans. Program. Lang. Syst., 1997

Compiling Parallel Code for Sparse Matrix Applications.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

Compiling Parallel Sparse Code for User-Defined Data Structures.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

Data-centric Multi-level Blocking.
Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation (PLDI), 1997

Sparse Code Generation for Imperfectly Nested Loops with Dependences.
Proceedings of the 11th international conference on Supercomputing, 1997

Compiler and Run-Time Support for Semi-Structured Applications.
Proceedings of the 11th international conference on Supercomputing, 1997

A Relational Approach to the Compilation of Sparse Matrix Programs.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

Data Movement and Control Substrate for Parallel Scientific Computing.
Proceedings of the Communication and Architectural Support for Network-Based Parallel Computing, 1997

1996
Transformations for Imperfectly Nested Loops.
Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996

Generalized Dominance and Control Dependence.
Proceedings of the ACM SIGPLAN'96 Conference on Programming Language Design and Implementation (PLDI), 1996

1995
APT: A Data Structure for Optimal Control Dependence Computation.
Proceedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation (PLDI), 1995

Automatic Parallelization of the Conjugate Gradient Algorithm.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

1994
Compiling for Distributed Memory Architectures.
IEEE Trans. Parallel Distrib. Syst., 1994

A singular loop transformation framework based on non-singular matrices.
International Journal of Parallel Programming, 1994

The Program Structure Tree: Computing Control Regions in Linear Time.
Proceedings of the ACM SIGPLAN'94 Conference on Programming Language Design and Implementation (PLDI), 1994

Solving Alignment Using Elementary Linear Algebra.
Proceedings of the Languages and Compilers for Parallel Computing, 1994

1993
Access Normalization: Loop Restructuring for NUMA Compilers.
ACM Trans. Comput. Syst., 1993

Dependence-Based Program Analysis.
Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation (PLDI), 1993

Register renaming and dynamic speculation: an alternative approach.
Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

1992
Loop Transformations for NUMA Machines.
SIGPLAN Workshop, 1992

Abstract Semantics for a Higher-Order Functional Language with Logic Variables.
Proceedings of the Conference Record of the Nineteenth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1992

A Singular Loop Transformation Framework Based on Non-Singular Matrices.
Proceedings of the Languages and Compilers for Parallel Computing, 1992

Access Normalization: Loop Restructuring for NUMA Compilers.
Proceedings of the ASPLOS-V Proceedings, 1992

1991
A Fully Abstract Semantics for a First-Order Functional Language with Logic Variables.
ACM Trans. Program. Lang. Syst., 1991

Accumulators: New Logic Variable Abstractions for Functional Languages.
Theor. Comput. Sci., 1991

From Control Flow to Dataflow.
J. Parallel Distrib. Comput., 1991

Dependence Flow Graphs: An Algebraic Approach to Program Dependencies.
Proceedings of the Conference Record of the Eighteenth Annual ACM Symposium on Principles of Programming Languages, 1991

An Executable Representation of Distance and Direction.
Proceedings of the Languages and Compilers for Parallel Computing, 1991

1990
Static Scheduling for Dynamic Dataflow Machines.
J. Parallel Distrib. Comput., 1990

Compiling for Locality.
Proceedings of the 1990 International Conference on Parallel Processing, 1990

From Control Flow to Dataflow.
Proceedings of the 1990 International Conference on Parallel Processing, 1990

1989
I-Structures: Data Structures for Parallel Computing.
ACM Trans. Program. Lang. Syst., 1989

Process Decomposition Through Locality of Reference.
Proceedings of the ACM SIGPLAN'89 Conference on Programming Language Design and Implementation (PLDI), 1989

A Fully Abstract Semantics for a Functional Language with Logic Variables
Proceedings of the Fourth Annual Symposium on Logic in Computer Science (LICS '89), 1989

1988
Fine-grain compilation for pipelined machines.
The Journal of Supercomputing, 1988

Lazy evaluation and the logic variable.
Proceedings of the 2nd international conference on Supercomputing, 1988

Accumulators: A New Logic Variable Abstractions for Functional Languages.
Proceedings of the Foundations of Software Technology and Theoretical Computer Science, 1988

1986
Clarification of "Feeding Inputs on Demand" in Efficient Demand-Driven Evaluation - Part 1.
ACM Trans. Program. Lang. Syst., 1986

Efficient Demand-Driven Evaluation - Part 2.
ACM Trans. Program. Lang. Syst., 1986

I-structures: Data structures for parallel computing.
Proceedings of the Graph Reduction, Proceedings of a Workshop, Santa Fé, New Mexico, USA, September 29, 1986

1985
Efficient Demand-Driven Evaluation - Part 1.
ACM Trans. Program. Lang. Syst., 1985


  Loading...