John Shalf

Orcid: 0000-0002-0608-3690

Affiliations:
  • Lawrence Berkeley National Laboratory, Berkeley, CA, USA


According to our database1, John Shalf authored at least 154 papers between 1996 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
The 2023 Society for Industrial and Applied Mathematics Conference on Computational Science and Engineering.
Comput. Sci. Eng., 2023

Fast Community Detection in Graphs with Infomap Method using Accelerated Sparse Accumulation.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Fast Parallel Index Construction for Efficient K-truss-based Local Community Detection in Large Graphs.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

Towards a Flexible Hardware Implementation for Mixed-Radix Fourier Transforms.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2023

Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022
ASA: Accelerating Sparse Accumulation in Column-wise SpGEMM.
ACM Trans. Archit. Code Optim., 2022

A Case For Intra-rack Resource Disaggregation in HPC.
ACM Trans. Archit. Code Optim., 2022

Preparing for the Future - Rethinking Proxy Applications.
Comput. Sci. Eng., 2022

The COVID-19 High-Performance Computing Consortium.
Comput. Sci. Eng., 2022

Preparing for the Future - Rethinking Proxy Apps.
CoRR, 2022

2021
Temporal Computing With Superconductors.
IEEE Micro, 2021

Large-Scale Scientific Computing in the Fight Against COVID-19.
Comput. Sci. Eng., 2021

Interactive Supercomputing With Jupyter.
Comput. Sci. Eng., 2021

It's Time to Talk About HPC Storage: Perspectives on the Past and Future.
Comput. Sci. Eng., 2021

Facilitating CoDesign with Automatic Code Similarity Learning.
Proceedings of the 7th IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC, 2021

A systematic approach to improving data locality across Fourier transforms and linear algebra operations.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

HyPC-Map: A Hybrid Parallel Community Detection Algorithm Using Information-Theoretic Approach.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

2020
PINE: Photonic Integrated Networked Energy efficient datacenters (ENLITENED Program) [Invited].
JOCN, 2020

TIGER: Topology-aware Assignment using Ising machines Application to Classical Algorithm Tasks and Quantum Circuit Gates.
CoRR, 2020

TAGO: rethinking routing design in high performance reconfigurable networks.
Proceedings of the International Conference for High Performance Computing, 2020

RnR: A Software-Assisted Record-and-Replay Hardware Prefetcher.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Evaluating the Numerical Stability of Posit Arithmetic.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Understanding Quantum Control Processor Capabilities and Limitations through Circuit Characterization.
Proceedings of the International Conference on Rebooting Computing, 2020

DRAM-Less: Hardware Acceleration of Data Processing with New Memory.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

A Computational Temporal Logic for Superconducting Accelerators.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Asynchronous AMR on Multi-GPUs.
Proceedings of the High Performance Computing, 2019

Bandwidth steering in HPC using silicon nanophotonics.
Proceedings of the International Conference for High Performance Computing, 2019

HPC Interconnects at the End of Moore's Law.
Proceedings of the Optical Fiber Communications Conference and Exhibition, 2019

PARADISE - Post-Moore Architecture and Accelerator Design Space Exploration Using Device Level Simulation and Experiments.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Extending classical processors to support future large scale quantum accelerators.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

TIGER: topology-aware task assignment approach using ising machines.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

2018
SimpleSSD: Modeling Solid State Drives for Holistic System Simulation.
IEEE Comput. Archit. Lett., 2018

Phase asynchronous AMR execution for productive and performant astrophysical flows.
Proceedings of the International Conference for High Performance Computing, 2018

Architectural Opportunities and Challenges from Emerging Photonics in Future Systems.
Proceedings of the Photonics in Switching and Computing, 2018

MRG8: Random Number Generation for the Exascale Era.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2018

Open2C: open-source generator for exploration of coherent cache memory subsystems.
Proceedings of the International Symposium on Memory Systems, 2018

2017
Trends in Data Locality Abstractions for HPC Systems.
IEEE Trans. Parallel Distributed Syst., 2017

Towards an Integrated Strategy to Preserve Digital Computing Performance Scaling Using Emerging Technologies.
Proceedings of the High Performance Computing, 2017

Reconfigurable Silicon Photonic Interconnect for Many-Core Architecture.
Proceedings of the High Performance Computing, 2017

CASPER - Configurable design space exploration of programmable architectures for machine learning using beyond moore devices.
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2017

TraceTracker: Hardware/software co-evaluation for large-scale I/O workload reconstruction.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Overlapping Data Transfers with Computation on GPU with Tiles.
Proceedings of the 46th International Conference on Parallel Processing, 2017

Last Level Collective Hardware Prefetching For Data-Parallel Applications.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

OpenSoC system architect: An open toolkit for building soft-cores on FPGAs.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Nonintrusive AMR Asynchrony for Communication Optimization.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
NANDFlashSim: High-Fidelity, Microarchitecture-Aware NAND Flash Memory Simulation.
ACM Trans. Storage, 2016

BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework.
SIAM J. Sci. Comput., 2016

BoxLib with Tiling: An AMR Software Framework.
CoRR, 2016

TiDA: High-Level Programming Abstractions for Data Locality Management.
Proceedings of the High Performance Computing - 31st International Conference, 2016

Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement.
Proceedings of the International Conference for High Performance Computing, 2016

Characterizing the Performance of Hybrid Memory Cube Using ApexMAP Application Probes.
Proceedings of the Second International Symposium on Memory Systems, 2016

OpenSoC Fabric: On-chip network generator.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Silicon photonic memory interconnect for many-core architectures.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

2015
Extending Summation Precision for Network Reduction Operations.
Int. J. Parallel Program., 2015

ExaSAT: An exascale co-design tool for performance modeling.
Int. J. High Perform. Comput. Appl., 2015

Computing beyond Moore's Law.
Computer, 2015

OpenNVM: An open-sourced FPGA-based NVM controller for low level memory characterization.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Memory Errors in Modern Systems: The Good, The Bad, and The Ugly.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Integrating 3D Resistive Memory Cache into GPGPU for Energy-Efficient Data Processing.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Exploring the future of out-of-core computing with compute-local non-volatile memory.
Sci. Program., 2014

Abstract machine models and proxy architectures for exascale computing.
Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

Variable-width datapath for on-chip network static power reduction.
Proceedings of the Eighth IEEE/ACM International Symposium on Networks-on-Chip, 2014

OpenSoC Fabric: On-Chip Network Generator: Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric.
Proceedings of the 2014 International Workshop on Network on Chip Architectures, 2014

Collective memory transfers for multi-core chips.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013
Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture.
Comput. Sci. Eng., 2013

Software Design Space Exploration for Exascale Combustion Co-design.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

A communications simulation methodology for AMR codes using task dependency analysis.
Proceedings of the 3rd Workshop on Irregular Applications - Architectures and Algorithms, 2013

Design of a large-scale storage-class RRAM system.
Proceedings of the International Conference on Supercomputing, 2013

Topic 14+16: High-Performance and Scientific Applications and Extreme-Scale Computing - (Introduction).
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012
A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI.
SIGMETRICS Perform. Evaluation Rev., 2012

Optimization of geometric multigrid for emerging multi- and manycore processors.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

The Analysis of Impact of Energy Efficiency Requirements on Programming Environments.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

NANDFlashSim: Intrinsic latency variation aware NAND flash memory system modeling and simulation at microarchitecture level.
Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies, 2012

Toward codesign in high performance computing systems.
Proceedings of the 2012 IEEE/ACM International Conference on Computer-Aided Design, 2012

Experiences with 100Gbps network applications.
Proceedings of the DIDC'12, 2012

On the Role of Co-design in High Performance Computing.
Proceedings of the Transition of HPC Towards Exascale Computing, 2012

2011
Green Flash: Climate Machine (LBNL).
Proceedings of the Encyclopedia of Parallel Computing, 2011

The International Exascale Software Project roadmap.
Int. J. High Perform. Comput. Appl., 2011

Rethinking Hardware-Software Codesign for Exascale Systems.
Computer, 2011

Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning.
Proceedings of the Conference on High Performance Computing Networking, 2011

Multithreaded global address space communication techniques for gyrokinetic fusion applications on ultra-scale platforms.
Proceedings of the Conference on High Performance Computing Networking, 2011

Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems.
Proceedings of the Conference on High Performance Computing Networking, 2011

Hardware/software co-design for energy-efficient seismic modeling.
Proceedings of the Conference on High Performance Computing Networking, 2011

Let there be light!: the future of memory systems is photonics and 3D stacking.
Proceedings of the 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '11, 2011

2010
Communication Requirements and Interconnect Optimization for High-End Scientific Applications.
IEEE Trans. Parallel Distributed Syst., 2010

Exascale Computing Technology Challenges.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

Parallel I/O performance: From events to ensembles.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

An auto-tuning framework for parallel multicore stencil computations.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Exascale Computing and the Role of Co-Design.
Proceedings of the High Performance Computing: From Grids and Clouds to Exascale, 2010

Silicon Nanophotonic Network-on-Chip Using TDM Arbitration.
Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010

Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud.
Proceedings of the Cloud Computing, Second International Conference, 2010

Defining future platform requirements for e-Science clouds.
Proceedings of the 1st ACM Symposium on Cloud Computing, 2010

Auto-Tuning Stencil Computations on Multicore and Accelerators.
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

2009
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors.
SIAM Rev., 2009

Optimization of sparse matrix-vector multiplication on emerging multicore platforms.
Parallel Comput., 2009

HPC global file system performance analysis using a scientific-application derived benchmark.
Parallel Comput., 2009

Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms.
J. Parallel Distributed Comput., 2009

Energy-Efficient Computing for Extreme-Scale Science.
Computer, 2009

A design methodology for domain-optimized power-efficient supercomputing.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

A Comparison of Different Communication Structures for Scalable Parallel Three Dimensional FFTs in First Principles Codes.
Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

Analysis of photonic networks for a chip multiprocessor using scientific applications.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009

Scalability challenges for massively parallel AMR applications.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture.
Proceedings of the Architecture of Computing Systems, 2009

Storage Technology.
Proceedings of the Scientific Data Management - Challenges, Technology, and Deployment., 2009

2008
Towards Ultra-High Resolution Models of Climate and Weather.
Int. J. High Perform. Comput. Appl., 2008

Scientific Application Performance On Leading Scalar and Vector Supercomputering Platforms.
Int. J. High Perform. Comput. Appl., 2008

Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Lattice Boltzmann simulation optimization on leading multicore platforms.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Power efficiency in high performance computing.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007
Scientific Computing Kernels on the Cell Processor.
Int. J. Parallel Program., 2007

Cactus Framework: Black Holes to Gamma Ray Bursts
CoRR, 2007

Investigation of leading HPC I/O performance using a scientific-application derived benchmark.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Scientific Application Performance on Candidate PetaScale Platforms.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Reconfigurable hybrid interconnection for static and dynamic scientific applications.
Proceedings of the 4th Conference on Computing Frontiers, 2007

2006
Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems.
Proceedings of the High Performance Computing for Computational Science, 2006

HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets using Fast Bitmap Indices.
Proceedings of the 18th International Conference on Scientific and Statistical Database Management, 2006

The potential of the cell processor for scientific computing.
Proceedings of the Third Conference on Computing Frontiers, 2006

Implicit and explicit optimizations for stencil computations.
Proceedings of the 2006 workshop on Memory System Performance and Correctness, 2006

2005
The Astrophysics Simulation Collaboratory Portal: a framework for effective distributed research.
Future Gener. Comput. Syst., 2005

Performance evaluation of the SX-6 vector architecture for scientific computations.
Concurr. Pract. Exp., 2005

Query-Driven Visualization of Large Data Sets.
Proceedings of the 16th IEEE Visualization Conference, 2005

DEX: Increasing the Capability of Scientific Data Analysis Pipelines by Using Efficient Bitmap Indices to Accelerate Scientific Visualization.
Proceedings of the 17th International Conference on Scientific and Statistical Database Management, 2005

Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Impact of modern memory subsystems on cache optimizations for stencil computations.
Proceedings of the 2005 workshop on Memory System Performance, 2005

Consuming Network Bandwidth with Visapult.
Proceedings of the Visualization Handbook., 2005

2004
Scientific Computations on Modern Parallel Vector Systems.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Identifying Performance Bottlenecks on Modern Microarchitectures Using an Adaptable Probe.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

2003
Enabling Applications on the Grid: A Gridlab Overview.
Int. J. High Perform. Comput. Appl., 2003

The Grid and Future Visualization System Architectures.
IEEE Computer Graphics and Applications, 2003

Deploying Web-Based Visual Exploration Tools on the Grid.
IEEE Computer Graphics and Applications, 2003

Grid-Distributed Visualizations Using Connectionless Protocols.
IEEE Computer Graphics and Applications, 2003

Interoperability of Visualization Software and Data Models is NOT an Achievable Goal.
Proceedings of the 14th IEEE Visualization Conference, 2003

Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Parallel Cell Projection Rendering of Adaptive Mesh Refinement Data.
Proceedings of the IEEE Symposium on Parallel and Large-Data Visualization and Graphics 2003, 2003

2002
Community software development with the Astrophysics Simulation Collaboratory.
Concurr. Comput. Pract. Exp., 2002

The Astrophysics Simulation Collaboratory: A Science Portal Enabling Community Software Development.
Clust. Comput., 2002

The Cactus Framework and Toolkit: Design and Applications.
Proceedings of the High Performance Computing for Computational Science, 2002


2001
The Cactus Worm: Experiments with Dynamic Resource Discovery and Allocation in a Grid Environment.
Int. J. High Perform. Comput. Appl., 2001

Cactus Tools for Grid Applications.
Clust. Comput., 2001

High-quality Volume Rendering of Adaptive Mesh Refinement Data.
Proceedings of the 6th International Fall Workshop on Vision, Modeling, and Visualization, 2001

Extraction of Crack-free Isosurfaces from Adaptive Mesh Refinement Data.
Proceedings of the 3rd Joint Eurographics - IEEE TCVG Symposium on Visualization, 2001

The Astrophysics Simulation Collaboratory Portal: A Science Portal Enabling Community Software Development.
Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10 2001), 2001

2000
The Cactus Code: A Problem Solving Environment for the Grid.
Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing, 2000

1999
Diving deep: data-management and visualization strategies for adaptive mesh refinement simulations.
Comput. Sci. Eng., 1999

Solving Einstein's Equations on Supercomputers.
Computer, 1999

Numerical Relativity in a Distributed Environment.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

1996
Galaxies Collide On the I-Way: an Example of Heterogeneous Wide-Area Collaborative Supercomputing.
Int. J. High Perform. Comput. Appl., 1996


  Loading...