Samuel Williams

Orcid: 0000-0002-8327-5717

Affiliations:

Lawrence Berkeley National Laboratory, Berkeley, CA, USA
University of California at Berkeley, CA, USA (PhD 2008)

According to our database¹, Samuel Williams authored at least 118 papers between 2001 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

StFT: Spatio-temporal Fourier Transformer for Long-term Dynamics Prediction.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

2025

Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems.

[BibT_eX]

[DOI]

CoRR, November, 2025

Spatio-temporal Fourier Transformer (StFT) for Long-term Dynamics Prediction.

[BibT_eX]

[DOI]

CoRR, March, 2025

Transfer learning nonlinear plasma dynamic transitions in low dimensional embeddings via deep neural networks.

[BibT_eX]

[DOI]

Mach. Learn. Sci. Technol., 2025

Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2025

Maximizing Power-Constrained Supercomputing Throughput.

[BibT_eX]

[DOI]

Proceedings of the ISC High Performance 2025 Research Paper Proceedings (40th International Conference), 2025

Benchmark-driven Models for Energy Analysis and Attribution of GPU-Accelerated Supercomputing.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2025

Roofline Analysis of Tightly-Coupled CPU-GPU Superchips: A Study on MI300A and GH200.

[BibT_eX]

[DOI]

Oscar Antepara

Leonid Oliker

Samuel Williams

Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, 2025

2024

Evaluating the potential of disaggregated memory systems for HPC applications.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., August, 2024

Bricks: A high-performance portability layer for computations on block-structured grids.

[BibT_eX]

[DOI]

Mahesh Lakshminarasimhan

Int. J. High Perform. Comput. Appl., 2024

LPSim: Large Scale Multi-GPU Parallel Computing based Regional Scale Traffic Simulation Framework.

[BibT_eX]

[DOI]

CoRR, 2024

FTL: Transfer Learning Nonlinear Plasma Dynamic Transitions in Low Dimensional Embeddings via Deep Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2024

Comprehensive Performance Modeling and System Design Insights for Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Expediting Higher Fidelity Plasma State Reconstructions for the DIII-D National Fusion Facility Using Leadership Class Computing Resources.

[BibT_eX]

[DOI]

Sterling Paul Smith

Zichuan Anthony Xing

Torrin Bechtel Amara

Severin Sebastian Denk

Earl William DeShazer

Christopher Mitchell Clark

Nicholas Scoville Tyler

Thomas D. Uram

Samuel Webb Williams

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

System-Wide Roofline Profiling -a Case Study on NERSC's Perlmutter Supercomputer.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

High-Performance, Scalable Geometric Multigrid via Fine-Grain Data Blocking for GPUs.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Performance Portable Optimizations of an Ice-sheet Modeling Code on GPU-supercomputers.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

A Workflow Roofline Model for End-to-End Workflow Performance Analysis.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs.

[BibT_eX]

[DOI]

Mahesh Lakshminarasimhan

Mary W. Hall

Samuel Williams

Oscar Antepara

Proceedings of the 53rd International Conference on Parallel Processing, 2024

2023

Performance-Portable GPU Acceleration of the EFIT Tokamak Plasma Equilibrium Reconstruction Code.

[BibT_eX]

[DOI]

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Performance Portability Evaluation of Blocked Stencil Computations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

Evaluating the Performance of One-sided Communication on CPUs and GPUs.

[BibT_eX]

[DOI]

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2022

A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model.

[BibT_eX]

[DOI]

Marco Domenico Santambrogio

IEEE Trans. Computers, 2022

Understanding the Impact of Input Entropy on FPU, CPU, and GPU Power.

[BibT_eX]

[DOI]

CoRR, 2022

FPGA-based HPC accelerators: An evaluation on performance and energy efficiency.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2022

Instruction Roofline: An insightful visual performance model for GPUs.

[BibT_eX]

[DOI]

Nan Ding

Muaaz G. Awan

Samuel Williams

Concurr. Comput. Pract. Exp., 2022

A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM Workshop on Memory Centric High Performance Computing, 2022

2021

Hierarchical Roofline Performance Analysis for Deep Learning Applications.

[BibT_eX]

[DOI]

Proceedings of the Intelligent Computing, 2021

Improving communication by optimizing on-node data movement with data layout.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Architectural Requirements for Deep Learning Workloads in HPC Environments.

[BibT_eX]

[DOI]

Proceedings of the 2021 International Workshop on Performance Modeling, 2021

Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs.

[BibT_eX]

[DOI]

Douglas Doerfler

Farzad Fatollahi-Fard

Proceedings of the IWOCL'21: International Workshop on OpenCL, Munich Germany, April, 2021, 2021

A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver.

[BibT_eX]

[DOI]

Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms, 2021

2020

Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC-9 Perlmutter system.

[BibT_eX]

[DOI]

Charlene Yang

Thorsten Kurth

Samuel Williams

Concurr. Comput. Pract. Exp., 2020

Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight.

[BibT_eX]

[DOI]

Clust. Comput., 2020

Timemory: Modular Performance Analysis for HPC.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 35th International Conference, 2020

Time-Based Roofline for Deep Learning Performance Analysis.

[BibT_eX]

[DOI]

Proceedings of the Fourth IEEE/ACM Workshop on Deep Learning on Supercomputers, 2020

Leveraging One-Sided Communication for Sparse Triangular Solvers.

[BibT_eX]

[DOI]

Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing, 2020

The Performance and Energy Efficiency Potential of FPGAs in Scientific Computing.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

A Case Study of Porting HPGMG from CUDA to OpenMP Target Offload.

[BibT_eX]

[DOI]

Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

Understanding Quantum Control Processor Capabilities and Limitations through Circuit Characterization.

[BibT_eX]

[DOI]

Anastasiia Butko

George Michelogiannakis

Proceedings of the International Conference on Rebooting Computing, 2020

A CAD-based methodology to optimize HLS code via the Roofline model.

[BibT_eX]

[DOI]

Marco D. Santambrogio

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

2019

AMReX: a framework for block-structured adaptive mesh refinement.

[BibT_eX]

[DOI]

J. Open Source Softw., 2019

Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2019

Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

An Instruction Roofline Model for GPUs.

[BibT_eX]

[DOI]

Nan Ding

Samuel Williams

Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019

Performance Analysis of GPU Programming Models Using the Roofline Scaling Trajectories.

[BibT_eX]

[DOI]

Khaled Z. Ibrahim

Samuel Williams

Leonid Oliker

Proceedings of the Benchmarking, Measuring, and Optimizing, 2019

2018

A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 33rd International Conference, 2018

Improving MPI Reduction Performance for Manycore Architectures with OpenMP and Data Compression.

[BibT_eX]

[DOI]

Hongzhang Shan

Samuel Williams

Calvin W. Johnson

Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018

SIMD code generation for stencils on brick decompositions.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Roofline Scaling Trajectories: A Method for Parallel Application and Architectural Performance Analysis.

[BibT_eX]

[DOI]

Khaled Z. Ibrahim

Samuel Williams

Leonid Oliker

Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

2017

A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers.

[BibT_eX]

[DOI]

Parallel Comput., 2017

Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2017

Reaching bandwidth saturation using transparent injection parallelization.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2017

Analyzing Performance of Selected NESAP Applications on the Cori HPC System.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2017

Performance Variability on Xeon Phi.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2017

Performance analysis and optimization of the RAMPAGE metal alloy potential generation software.

[BibT_eX]

[DOI]

Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems, 2017

Snowflake: A Lightweight Portable Stencil DSL.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

A Locality-Based Threading Algorithm for the Configuration-Interaction Method.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Simultaneously Solving Swarms of Small Sparse Systems on SIMD Silicon.

[BibT_eX]

[DOI]

Bryce Adelstein-Lelbach

Hans Johansen

Samuel Williams

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016

An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2016

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2016

Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2016

Extreme scale plasma turbulence simulations on top supercomputers worldwide.

[BibT_eX]

[DOI]

Carlos Rosales-Fernandez

Timothy J. Williams

Proceedings of the International Conference for High Performance Computing, 2016

Experiences of Applying One-Sided Communication to Nearest-Neighbor Communication.

[BibT_eX]

[DOI]

Proceedings of the 2016 PGAS Applications Workshop, 2016

Evaluating and Optimizing the NERSC Workload on Knights Landing.

[BibT_eX]

[DOI]

Proceedings of the 7th International Workshop on Performance Modeling, 2016

OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms.

[BibT_eX]

[DOI]

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

2015

Parallel processing of filtered queries in attributed semantic graphs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2015

ExaSAT: An exascale co-design tool for performance modeling.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2015

An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling.

[BibT_eX]

[DOI]

CoRR, 2015

Parallel implementation and performance optimization of the configuration-interaction method.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Thread-level parallelization and optimization of NWChem for the Intel MIC architecture.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

Exploiting communication concurrency on high performance computing systems.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Multicore and Manycore Architectures.

[BibT_eX]

[DOI]

Panayot S. Vassilevski

Proceedings of the Parallel Processing and Applied Mathematics, 2015

Compiler-Directed Transformation for Higher-Order Stencils.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Parallel Performance Optimizations on Unstructured Mesh-based Simulations.

[BibT_eX]

[DOI]

Jeffrey K. Hollingsworth

Allen D. Malony

Samuel Williams

Leonid Oliker

Proceedings of the International Conference on Computational Science, 2015

2014

Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

Evaluation of PGAS Communication Paradigms with Geometric Multigrid.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Collective memory transfers for multi-core chips.

[BibT_eX]

[DOI]

George Michelogiannakis

Alexander Williams

Samuel Williams

John Shalf

Proceedings of the 2014 International Conference on Supercomputing, 2014

Analysis and tuning of libtensor framework on multicore architectures.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

2013

Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2013

Kinetic turbulence simulations at extreme scale on leadership-class systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

Loop Chaining: A Programming Abstraction for Balancing Locality and Parallelism.

[BibT_eX]

[DOI]

Christopher D. Krieger

Michelle Mills Strout

Catherine Olschanowsky

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

High-Productivity and High-Performance Analysis of Filtered Semantic Graphs.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Compiler generation and autotuning of communication-avoiding operators for geometric multigrid.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

2012

Optimization of Parallel Particle-to-Grid Interpolation on Leading Multicore Platforms.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2012

Optimization of geometric multigrid for emerging multi- and manycore processors.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Poster: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

High-performance analysis of filtered semantic graphs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms.

[BibT_eX]

[DOI]

Parallel Comput., 2011

Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

Hardware/software co-design for energy-efficient seismic modeling.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010

An auto-tuning framework for parallel multicore stencil computations.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures.

[BibT_eX]

[DOI]

Aparna Chandramowlishwaran

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Sparse Matrix-Vector Multiplication on Multicore and Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

Auto-Tuning Stencil Computations on Multicore and Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

2009

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors.

[BibT_eX]

[DOI]

SIAM Rev., 2009

Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2009

The impact of IBM Cell technology on the programming paradigm in the context of computer systems for climate and weather models.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2009

Roofline: an insightful visual performance model for multicore architectures.

[BibT_eX]

[DOI]

Samuel Williams

Andrew Waterman

David A. Patterson

Commun. ACM, 2009

A design methodology for domain-optimized power-efficient supercomputing.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems, 2009

2008

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Lattice Boltzmann simulation optimization on leading multicore platforms.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007

Scientific Computing Kernels on the Cell Processor.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2007

Optimization of sparse matrix-vector multiplication on emerging multicore platforms.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

2006

The potential of the cell processor for scientific computing.

[BibT_eX]

[DOI]

Proceedings of the Third Conference on Computing Frontiers, 2006

Implicit and explicit optimizations for stencil computations.

[BibT_eX]

[DOI]

Proceedings of the 2006 workshop on Memory System Performance and Correctness, 2006

2001

Hardware/compiler codevelopment for an embedded media processor.

[BibT_eX]

[DOI]

Christoforos E. Kozyrakis

Proc. IEEE, 2001

Samuel Williams

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...