Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

T-GCN: A Sampling Based Streaming Graph Neural Network System with Hybrid Architecture.

[BibT_eX]

[DOI]

Chengying Huan

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2021

COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2021

TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2021

Toward efficient interactions between Python and native libraries.

[BibT_eX]

[DOI]

Proceedings of the ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021

MAPA: multi-accelerator pattern allocation policy for multi-tenant GPU servers.

[BibT_eX]

[DOI]

Kiran Ranganath

Joshua D. Suetterlein

Joseph B. Manzano

Shuaiwen Leon Song

Daniel Wong

Proceedings of the International Conference for High Performance Computing, 2021

Dr. Top-k: delegate-centric Top-k on GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

An efficient uncertain graph processing framework for heterogeneous architectures.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

A novel memory-efficient deep learning training framework via error-bounded lossy compression.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Q-VR: system-level design for future mobile collaborative virtual reality.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Energy-Efficient GPU L2 Cache Design Using Instruction-Level Data Locality Similarity.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2020

An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning.

[BibT_eX]

[DOI]

CoRR, 2020

MalFox: Camouflaged Adversarial Malware Example Generation Based on C-GANs Against Black-Box Detectors.

[BibT_eX]

[DOI]

CoRR, 2020

ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs.

[BibT_eX]

[DOI]

CoRR, 2020

Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019

Speeding up Collective Communications Through Inter-GPU Re-Routing.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2019

BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

OO-VR: NUMA friendly object-oriented VR rendering framework for future NUMA-based multi-GPU systems.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World with Customized Memory Cube.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

LoSCache: Leveraging Locality Similarity to Build Energy-Efficient GPU L2 Cache.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

2018

NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks.

[BibT_eX]

[DOI]

Probir Roy

Shuaiwen Leon Song

Sriram Krishnamoorthy

Abhinav Vishnu

Dipanjan Sengupta

Xu Liu

ACM Trans. Archit. Code Optim., 2018

Superneurons: dynamic GPU memory management for training deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Introduction to HPPAC 2018.

[BibT_eX]

[DOI]

Shuaiwen Leon Song

Natalie J. Bates

Ang Li

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Warp-Consolidation: A Novel Execution Model for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

Perception-Oriented 3D Rendering Approximation for Modern Graphics Processors.

[BibT_eX]

[DOI]

Chenhao Xie

Xin Fu

Shuaiwen Song

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

CUDAAdvisor: LLVM-based runtime profiling for modern GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

Lightweight detection of cache conflicts.

[BibT_eX]

[DOI]

Probir Roy

Shuaiwen Leon Song

Sriram Krishnamoorthy

Xu Liu

Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

2017

EvoGraph: On-the-Fly Efficient Mining of Evolving Graphs on GPU.

[BibT_eX]

[DOI]

Dipanjan Sengupta

Shuaiwen Leon Song

Proceedings of the High Performance Computing - 32nd International Conference, 2017

Evaluating GPGPU Memory Performance Through the C-AMAT Model.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Memory Centric Programming for HPC, 2017

Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels.

[BibT_eX]

[DOI]

Ang Li

Weifeng Liu

Mads Ruben Burgdorff Kristensen

Proceedings of the International Conference for High Performance Computing, 2017

BVF: enabling significant on-chip power savings via bit-value-favor for throughput processors.

[BibT_eX]

[DOI]

Ang Li

Wenfeng Zhao

Shuaiwen Leon Song

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

HPPAC Workshop Introduction.

[BibT_eX]

[DOI]

Shuaiwen Leon Song

Richard W. Vuduc

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

IPDRM Workshop Introduction.

[BibT_eX]

[DOI]

Shuaiwen Leon Song

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Enabling scalability-sensitive speculative parallelization for FSM computations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2017

Processing-in-Memory Enabled Graphics Processors for 3D Rendering.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Locality-Aware CTA Clustering for Modern GPUs.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016

Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology.

[BibT_eX]

[DOI]

Li Tan

Zizhong Chen

Shuaiwen Leon Song

ACM Trans. Archit. Code Optim., 2016

A Graph-based Model for GPU Caching Problems.

[BibT_eX]

[DOI]

CoRR, 2016

Orion: A Framework for GPU Occupancy Tuning.

[BibT_eX]

[DOI]

Ari B. Hayes

Lingda Li

Daniel G. Chavarría-Miranda

Shuaiwen Leon Song

Eddy Z. Zhang

Proceedings of the 17th International Middleware Conference, Trento, Italy, December 12, 2016

IPDRM Introduction and Committees.

[BibT_eX]

[DOI]

Shuaiwen Leon Song

Todd Gamblin

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

HPPAC Introduction and Committees.

[BibT_eX]

[DOI]

Barry Rountree

Shuaiwen Leon Song

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

X: A Comprehensive Analytic Model for Parallel Machines.

[BibT_eX]

[DOI]

Daniel G. Chavarría-Miranda

Henk Corporaal

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

SFU-Driven Transparent Approximation Acceleration on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Supercomputing, 2016

Tag-Split Cache for Efficient GPGPU Cache Utilization.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Supercomputing, 2016

New-Sum: A Novel Online ABFT Scheme For General Iterative Methods.

[BibT_eX]

[DOI]

Dingwen Tao

Shuaiwen Leon Song

Sriram Krishnamoorthy

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

SMT-Aware Instantaneous Footprint Optimization.

[BibT_eX]

[DOI]

Probir Roy

Xu Liu

Shuaiwen Leon Song

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Critical points based register-concurrency autotuning for GPUs.

[BibT_eX]

[DOI]

Daniel G. Chavarría-Miranda

Henk Corporaal

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Combating the Reliability Challenge of GPU Register File at Low Supply Voltage.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Scaling Support Vector Machines on modern HPC platforms.

[BibT_eX]

[DOI]

Yang You

Haohuan Fu

Shuaiwen Leon Song

Amanda Peters Randles

J. Parallel Distributed Comput., 2015

GraphReduce: processing large-scale graphs on accelerator-based systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Locality-Driven Dynamic GPU Cache Bypassing.

[BibT_eX]

[DOI]

Siva Kumar Sastry Hari

Huiyang Zhou

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Gregarious Data Re-structuring in a Many Core Architecture.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

2014

Extending PowerPack for Profiling and Analysis of High-Performance Accelerator-Based Systems.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2014

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2014

MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures.

[BibT_eX]

[DOI]

Amanda Peters Randles

Guangwen Yang

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

An adaptive cross-architecture combination method for graph traversal.

[BibT_eX]

[DOI]

Yang You

Shuaiwen Leon Song

Darren J. Kerbyson

Proceedings of the 2014 International Conference on Supercomputing, 2014

ACDT: Architected Composite Data Types trading-in unfettered data access for improved execution.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2013

Designing energy efficient communication runtime systems: a view from PGAS models.

[BibT_eX]

[DOI]

J. Supercomput., 2013

Unified performance and power modeling of scientific workloads.

[BibT_eX]

[DOI]

Shuaiwen Leon Song

Kevin J. Barker

Darren J. Kerbyson

Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, 2013

A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

EDR: An energy-aware runtime load distribution system for data-intensive applications in the cloud.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2012

Abstract: Three Steps to Model Power-Performance Efficiency for Emergent GPU-Based Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Poster: Three Steps to Model Power-Performance Efficiency for Emergent GPU-Based Parallel Systems.

[BibT_eX]

[DOI]

Shuaiwen Leon Song

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Energy-Aware Replica Selection for Data-Intensive Services in Cloud.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Symposium on Modeling, 2012

System-level power-performance efficiency modeling for emergent GPU architectures.

[BibT_eX]

[DOI]

Shuaiwen Song

Kirk W. Cameron

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

An ISO-Energy-Efficient Approach to Scalable System Power-Performance Optimization.

[BibT_eX]

[DOI]

Shuaiwen Song

Matthew Grove

Kirk W. Cameron

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010

PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2010

Fault-tolerant communication runtime support for data-centric programming models.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on High Performance Computing, 2010

Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

2009

Energy Profiling and Analysis of the HPC Challenge Benchmarks.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2009

Shuaiwen Song

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...