Sally A. McKee

According to our database1, Sally A. McKee
  • authored at least 116 papers between 1993 and 2018.
  • has a "Dijkstra number"2 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepage:

On csauthors.net:

Bibliography

2018
Verifying Reliability Properties Using the Hyperball Abstract Domain.
ACM Trans. Program. Lang. Syst., 2018

2017
Main Memory in HPC: Do We Need More or Could We Live with Less?
TACO, 2017

Do superconducting processors really need cryogenic memories?: the case for cold DRAM.
Proceedings of the International Symposium on Memory Systems, 2017

RAGuard: A Hardware Based Mechanism for Backward-Edge Control-Flow Integrity.
Proceedings of the Computing Frontiers Conference, 2017

2016
Co-DIMM: Inter-Socket Data Sharing via a Common DIMM Channel.
Proceedings of the Second International Symposium on Memory Systems, 2016

Twin-Load: Bridging the Gap between Conventional Direct-Attached and Buffer-on-Board Memory Systems.
Proceedings of the Second International Symposium on Memory Systems, 2016

Agave: A benchmark suite for exploring the complexities of the Android software stack.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

A Methodology for Modeling Dynamic and Static Power Consumption for Multicore Processors.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Extending On-chip Interconnects for rack-level remote resource access.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Understanding Data Analytics Workloads on Intel(R) Xeon Phi(R).
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Venice: Exploring server architectures for effective resource sharing.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

P-Socket: optimizing a communication library for a PCIe-based intra-rack interconnect.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Redesigning a tagless access buffer to require minimal ISA changes.
Proceedings of the 2016 International Conference on Compilers, 2016

2015
Adapting Memory Hierarchies for Emerging Datacenter Interconnects.
J. Comput. Sci. Technol., 2015

Twin-Load: Building a Scalable Memory System over the Non-Scalable Interface.
CoRR, 2015

Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?
Proceedings of the 2015 International Symposium on Memory Systems, 2015

Exploiting Program Semantics to Place Data in Hybrid Memory.
Proceedings of the 2015 International Conference on Parallel Architecture and Compilation, 2015

2014
Characterizing and Subsetting Big Data Workloads.
CoRR, 2014

QBLESS: A case for QoS-aware bufferless NoCs.
Proceedings of the IEEE 22nd International Symposium of Quality of Service, 2014

Performance and Energy Analysis of the Restricted Transactional Memory Implementation on Haswell.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Understanding the behavior of in-memory computing workloads.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Characterizing and subsetting big data workloads.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

DTail: a flexible approach to DRAM refresh management.
Proceedings of the 2014 International Conference on Supercomputing, 2014

An Automated Performance-Aware Approach to Reliability Transformations.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Digging deeper into cluster system logs for failure prediction and root cause diagnosis.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
Improving data access efficiency by using a tagless access buffer (TAB).
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012
Active memory controller.
The Journal of Supercomputing, 2012

Techniques to Measure, Model, and Manage Power.
Advances in Computers, 2012

An LTE Uplink Receiver PHY benchmark and subframe-based power management.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Parallelizing more Loops with Compiler Guided Refactoring.
Proceedings of the 41st International Conference on Parallel Processing, 2012

ROSE: : FTTransform - A source-to-source translation framework for exascale fault-tolerance research.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2012

Design Principles for Synthesizable Processor Cores.
Proceedings of the Architecture of Computing Systems - ARCS 2012 - 25th International Conference, Munich, Germany, February 28, 2012

2011
Memory Wall.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Guest Editors' Introduction.
International Journal of Parallel Programming, 2011

Power-Aware Resource Scheduling in Base Stations.
Proceedings of the MASCOTS 2011, 2011

SoftBeam: Precise tracking of transient faults and vulnerability analysis at processor design time.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

Performance optimization by dynamic code transformation.
Proceedings of the 8th Conference on Computing Frontiers, 2011

2010
An approach to resource-aware co-scheduling for CMPs.
Proceedings of the 24th International Conference on Supercomputing, 2010

Portable, scalable, per-core power estimation for intelligent resource management.
Proceedings of the International Green Computing Conference 2010, 2010

Comparing Scalability Prediction Strategies on an SMP of CMPs.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Designing OS for HPC Applications: Scheduling.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

Global management of cache hierarchies.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
Data Cache Techniques to Save Power and Deliver High Performance in Embedded Systems.
Trans. HiPEAC, 2009

Real time power estimation and thread scheduling via performance counters.
SIGARCH Computer Architecture News, 2009

Machine learning based online performance prediction for runtime parallelization and task scheduling.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Compiler-enhanced incremental checkpointing for OpenMP applications.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Understanding PARSEC performance on contemporary CMPs.
Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 2009

Prediction-based power estimation and scheduling for CMPs.
Proceedings of the 23rd international conference on Supercomputing, 2009

Cancellation of loads that return zero using zero-value caches.
Proceedings of the 23rd international conference on Supercomputing, 2009

PARSEC: hardware profiling of emerging workloads for CMP design.
Proceedings of the 23rd international conference on Supercomputing, 2009

Code density concerns for new architectures.
Proceedings of the 27th International Conference on Computer Design, 2009

Revisiting Cache Block Superloading.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Accomodating Diversity in CMPs with Heterogeneous Frequencies.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Core monitors: monitoring performance in multicore processors.
Proceedings of the 6th Conference on Computing Frontiers, 2009

2008
Efficient architectural design space exploration via predictive modeling.
TACO, 2008

Guest Editor's Introduction.
J. Instruction-Level Parallelism, 2008

Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education
CoRR, 2008

Augmenting priority rule heuristics with justification and rollout to solve the resource-constrained project scheduling problem.
Computers & OR, 2008

Compiler-enhanced incremental checkpointing for OpenMP applications.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Can hardware performance counters be trusted?
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008

A projection-based optimization framework for abstractions with application to the unstructured mesh domain.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Using Dynamic Binary Instrumentation to Generate Multi-platform SimPoints: Methodology and Accuracy.
Proceedings of the High Performance Embedded Architectures and Compilers, 2008

Architecture Performance Prediction Using Evolutionary Artificial Neural Networks.
Proceedings of the Applications of Evolutionary Computing, 2008

Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education.
Proceedings of the Collaborative Computing: Networking, 2008

Optimizing thread throughput for multithreaded workloads on memory constrained CMPs.
Proceedings of the 5th Conference on Computing Frontiers, 2008

Evolutionary system for prediction and optimization of hardware architecture performance.
Proceedings of the IEEE Congress on Evolutionary Computation, 2008

2007
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies.
ACM Trans. Program. Lang. Syst., 2007

Introduction to Part 3.
Trans. HiPEAC, 2007

Specializing Cache Structures for High Performance and Energy Conservation in Embedded Systems.
Trans. HiPEAC, 2007

Editorial to special issue on reliable computing.
JETC, 2007

Guest Editor's Introduction.
International Journal of Parallel Programming, 2007

Predicting parallel application performance via machine learning approaches.
Concurrency and Computation: Practice and Experience, 2007

Methods of inference and learning for performance modeling of parallel applications.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Leveraging High Performance Data Cache Techniques to Save Power in Embedded Systems.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007

Identifying energy-efficient concurrency levels using machine learning.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

A Phase-Adaptive Approach to Increasing Cache Performance.
Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007

2006
Dynamic program phase detection in distributed shared-memory multiprocessors.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Rethinking Processor Design: Parameter Correlations.
Proceedings of the 13th IEEE International Conference on Electronics, 2006

Efficiently exploring architectural design spaces via predictive modeling.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2005
Improving the computational intensity of unstructured mesh applications.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Beyond Basic Region Caching: Specializing Cache Structures for High Performance and Energy Conservation.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

An Approach to Performance Prediction for Parallel Applications.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Owl: next generation system monitoring.
Proceedings of the Second Conference on Computing Frontiers, 2005

Drowsy region-based caches: minimizing both dynamic and static power dissipation.
Proceedings of the Second Conference on Computing Frontiers, 2005

2004
Formal hardware specification languages for protocol compliance verification.
ACM Trans. Design Autom. Electr. Syst., 2004

Reflections on the memory wall.
Proceedings of the First Conference on Computing Frontiers, 2004

SimSnap: Fast-Forwarding via Native Execution and Application-Level Checkpointing.
Proceedings of the 8th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-8 2004), 2004

2003
A Cost Model For Integrated Restructuring Optimizations.
J. Instruction-Level Parallelism, 2003

Restructuring Computations for Temporal Data Cache Locality.
International Journal of Parallel Programming, 2003

Interactive Locality Optimization on NUMA Architectures.
Proceedings of the Proceedings ACM 2003 Symposium on Software Visualization, 2003

Identifying and Exploiting Spatial Regularity in Data Memory References.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

An MPEG-4 performance study for non-SIMD, general purpose architectures.
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

A Framework for Portable Shared Memory Programming.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

METRIC: Tracking Down Inefficiencies in the Memory Hierarchy via Binary Rewriting.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002
Computation regrouping: restructuring programs for temporal data cache locality.
Proceedings of the 16th international conference on Supercomputing, 2002

2001
The Impulse Memory Controller.
IEEE Trans. Computers, 2001

Reevaluating Online Superpage Promotion with Hardware Support.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

A Cost Framework for Evaluating Integrated Restructuring Optimizations.
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000
Dynamic Access Ordering for Streamed Computations.
IEEE Trans. Computers, 2000

Algorithmic foundations for a parallel vector access memory system.
SPAA, 2000

Online superpage promotion revisited (poster).
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2000

Profiling I/O Interrupts in Modern Architectures.
Proceedings of the MASCOTS 2000, Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 29 August, 2000

Hardware-only stream prefetching and dynamic access ordering.
Proceedings of the 14th international conference on Supercomputing, 2000

Design of a Parallel Vector Access Unit for SDRAM Memory Systems.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1999
Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Memory System Support for Image Processing.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998
Smarter Memory: Improving Bandwidth for Streamed References.
IEEE Computer, 1998

Caches as Filters: A New Approach to Cache Analysis.
Proceedings of the MASCOTS 1998, 1998

1996
A Memory Controller for Improved Performance of Streamed Computations on Symmetric Multiprocessors.
Proceedings of IPPS '96, 1996

Design and Evaluation of Dynamic Access Ordering Hardware.
Proceedings of the 10th international conference on Supercomputing, 1996

1995
Hitting the memory wall: implications of the obvious.
SIGARCH Computer Architecture News, 1995

Access Ordering and Memory-Conscious Cache Utilization.
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995

Bounds on Memory Bandwidth in Streamed Computations.
Proceedings of the Euro-Par '95 Parallel Processing, 1995

1994
Increasing Memory Bandwidth for Vector Computations.
Proceedings of the Programming Languages and System Architectures, 1994

Experimental Implementation of Dynamic Access Ordering.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

1993
Toward a Steiner engine: enhanced serial and parallel implementations of the iterated 1-Steiner MRST algorithm.
Proceedings of the Third Great Lakes Symposium on Design Automation of High Performance VLSI Systems, 1993


  Loading...