Michael Laurenzano

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

Architectural support for convolutional neural networks on modern CPUs.

[BibT_eX]

[DOI]

Animesh Jain

Gilles A. Pokam

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Reining in Long Tails in Warehouse-Scale Computers with Quick Voltage Boosting Using Adrenaline.

[BibT_eX]

[DOI]

Chang-Hong Hsu

ACM Trans. Comput. Syst., 2017

DeftNN: addressing bottlenecks for DNN execution on GPUs via synapse vector elimination and near-compute data fission.

[BibT_eX]

[DOI]

Scott A. Mahlke

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016

Low-overhead Online Code Transformations.

[BibT_eX]

[DOI]

Michael Laurenzano

PhD thesis, 2016

Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2016

Sirius Implications for Future Warehouse-Scale Computers.

[BibT_eX]

[DOI]

IEEE Micro, 2016

PMaC's green queue: a framework for selecting energy optimal DVFS configurations in large scale MPI applications.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2016

The case for colocation of high performance computing workloads.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2016

Input responsiveness: using canary inputs to dynamically steer approximation.

[BibT_eX]

[DOI]

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

CrystalBall: Statically analyzing runtime behavior via deep sequence learning.

[BibT_eX]

[DOI]

Stephen Zekany

Daniel Rings

Nathan Harada

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Continuous shape shifting: Enabling loop co-optimization via near-free dynamic code rewriting.

[BibT_eX]

[DOI]

Animesh Jain

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation.

[BibT_eX]

[DOI]

Scott A. Mahlke

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Characterization and bottleneck analysis of a 64-bit ARMv8 platform.

[BibT_eX]

[DOI]

Allyson Cauble-Chantrenne

Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

PowerChop: Identifying and Managing Non-critical Units in Hybrid Processor Architectures.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Lightweight, Early Identification of At-Risk CS1 Students.

[BibT_eX]

[DOI]

Soohyun Nam Liao

Daniel Zingaro

William G. Griswold

Leo Porter

Proceedings of the 2016 ACM Conference on International Computing Education Research, 2016

2015

PEBIL: binary instrumentation for practical data-intensive program analysis.

[BibT_eX]

[DOI]

Allyson Cauble-Chantrenne

Clust. Comput., 2015

Performance and energy efficiency analysis of 64-bit ARM using GAMESS.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, 2015

Compute bottlenecks on the new 64-bit ARM.

[BibT_eX]

[DOI]

Adam Jundt

Joshua Peraza

Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, 2015

DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers.

[BibT_eX]

[DOI]

Yiping Kang

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers.

[BibT_eX]

[DOI]

Vinicius Petrucci

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting.

[BibT_eX]

[DOI]

Chang-Hong Hsu

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers.

[BibT_eX]

[DOI]

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance.

[BibT_eX]

[DOI]

Muneeb Khan

Erik Hagersten

David Black-Schaffer

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Making the Most of SMT in HPC: System- and Application-Level Perspectives.

[BibT_eX]

[DOI]

Leo Porter

ACM Trans. Archit. Code Optim., 2014

SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Modeling the Impact of Reduced Memory Bandwidth on HPC Applications.

[BibT_eX]

[DOI]

Anthony Gamst

Martin Schulz

Proceedings of the Euro-Par 2014 Parallel Processing, 2014

Characterizing the Performance-Energy Tradeoff of Small ARM Cores in HPC Computation.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2014 Parallel Processing, 2014

2013

Characterizing Large-Scale HPC Applications through Trace extrapolation.

[BibT_eX]

[DOI]

Michael Laurenzano

Parallel Process. Lett., 2013

Inferring Large-Scale Computation Behavior via Trace Extrapolation.

[BibT_eX]

[DOI]

Michael Laurenzano

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Understanding the performance of stencil computations on Intel's Xeon Phi.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2012

Efficient HPC Data Motion via Scratchpad Memory.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

A Static Binary Instrumentation Threading Model for Fast Memory Trace Collection.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Modeling Power and Energy Usage of HPC Kernels.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Green Queue: Customized Large-Scale Clock Frequency Scaling.

[BibT_eX]

[DOI]

Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012

2011

An idiom-finding tool for increasing productivity of accelerators.

[BibT_eX]

[DOI]