David W. Nellans

Orcid: 0000-0001-5203-8367

According to our database1, David W. Nellans authored at least 40 papers between 2004 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Architectural Support for Optimizing Huge Page Selection Within the OS.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows.
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

2022
GPU Domain Specialization via Composable On-Package Architecture.
ACM Trans. Archit. Code Optim., 2022

The Implications of Page Size Management on Graph Analytics.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

2021
GPS: A Global Publish-Subscribe Model for Multi-GPU Memory Management.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems.
CoRR, 2020

Locality-Centric Data and Threadblock Management for Massive GPUs.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training.
IEEE Micro, 2019

NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Translation ranger: operating system support for contiguity-aware TLBs.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Understanding the Future of Energy Efficiency in Multi-Module GPUs.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Nimble Page Management for Tiered Memory Systems.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

2017
Beyond the socket: NUMA-aware GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016
Towards high performance paged memory for GPUs.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Selective GPU caches to eliminate CPU-GPU HW cache coherence.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015
Designing Efficient Heterogeneous Memory Architectures.
IEEE Micro, 2015

Flexible software profiling of GPU architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Unlocking bandwidth for GPUs in CC-NUMA systems.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Page Placement Strategies for GPUs within Heterogeneous Memory Systems.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Improving Operating System and Hardware Interactions Through Co-Design.
PhD thesis, 2014

Scaling the Power Wall: A Path to Exascale.
Proceedings of the International Conference for High Performance Computing, 2014

2013
Linux block IO: introducing multi-queue SSD access on multi-core systems.
Proceedings of the 6th Annual International Systems and Storage Conference, 2013

Better flash access via shape-shifting virtual memory pages.
Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems, 2013

2012
Managing Data Placement in Memory Systems with Multiple Memory Controllers.
Int. J. Parallel Program., 2012

2011
Beyond block I/O: Rethinking traditional storage primitives.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Prediction Based DRAM Row-Buffer Management in the Many-Core Era.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Hardware prediction of OS run-length for fine-grained resource customization.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

Improving Server Performance on Multi-cores via Selective Off-Loading of OS Functionality.
Proceedings of the Computer Architecture, 2010

Micro-pages: increasing DRAM efficiency with locality-aware data placement.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

SWEL: hardware cache coherence protocols to map shared data onto shared caches.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Handling the problems and opportunities posed by multiple on-chip memory controllers.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
OS execution on multi-cores: is out-sourcing worthwhile?
ACM SIGOPS Oper. Syst. Rev., 2009

2004
ARCS: an architectural level communication driven simulator.
Proceedings of the 14th ACM Great Lakes Symposium on VLSI 2004, 2004


  Loading...