David W. Nellans

Abhishek Bhattacharjee

Proceedings of the 46th International Symposium on Computer Architecture, 2019

Understanding the Future of Energy Efficiency in Multi-Module GPUs.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Nimble Page Management for Tiered Memory Systems.

[BibT_eX]

[DOI]

Zi Yan

Daniel Lustig

Abhishek Bhattacharjee

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

2017

Beyond the socket: NUMA-aware GPUs.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016

Towards high performance paged memory for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Selective GPU caches to eliminate CPU-GPU HW cache coherence.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015

Designing Efficient Heterogeneous Memory Architectures.

[BibT_eX]

[DOI]

IEEE Micro, 2015

Flexible software profiling of GPU architectures.

[BibT_eX]

[DOI]

Mark Stephenson

Siva Kumar Sastry Hari

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Unlocking bandwidth for GPUs in CC-NUMA systems.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Page Placement Strategies for GPUs within Heterogeneous Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014

Improving Operating System and Hardware Interactions Through Co-Design.

[BibT_eX]

PhD thesis, 2014

Scaling the Power Wall: A Path to Exascale.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

2013

Linux block IO: introducing multi-queue SSD access on multi-core systems.

[BibT_eX]

[DOI]

Proceedings of the 6th Annual International Systems and Storage Conference, 2013

Better flash access via shape-shifting virtual memory pages.

[BibT_eX]

[DOI]

Anirudh Badam

Vivek S. Pai

Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems, 2013

2012

Managing Data Placement in Memory Systems with Multiple Memory Controllers.

[BibT_eX]

[DOI]

Manu Awasthi

Int. J. Parallel Program., 2012

2011

Beyond block I/O: Rethinking traditional storage primitives.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Prediction Based DRAM Row-Buffer Management in the Many-Core Era.

[BibT_eX]

[DOI]

Manu Awasthi

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

Hardware prediction of OS run-length for fine-grained resource customization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

Improving Server Performance on Multi-cores via Selective Off-Loading of OS Functionality.

[BibT_eX]

[DOI]

Proceedings of the Computer Architecture, 2010

Micro-pages: increasing DRAM efficiency with locality-aware data placement.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

SWEL: hardware cache coherence protocols to map shared data onto shared caches.

[BibT_eX]

[DOI]

Seth H. Pugsley

Josef B. Spjut

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Handling the problems and opportunities posed by multiple on-chip memory controllers.

[BibT_eX]

[DOI]

Manu Awasthi

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

OS execution on multi-cores: is out-sourcing worthwhile?

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., 2009

2004

ARCS: an architectural level communication driven simulator.

[BibT_eX]

[DOI]

Vamshi Krishna Kadaru