Mahmut T. Kandemir

According to our database1, Mahmut T. Kandemir authored at least 641 papers between 1997 and 2018.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2016, "For contributions to compiler support for performance and energy optimization of computer architectures".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2018
Stochastic Modeling and Optimization of Stragglers.
IEEE Trans. Cloud Computing, 2018

Performance and Power-Efficient Design of Dense Non-Volatile Cache in CMPs.
IEEE Trans. Computers, 2018

ReveNAND: A Fast-Drift-Aware Resilient 3D NAND Flash Design.
TACO, 2018

IAA: Incidental Approximate Architectures for Extremely Energy-Constrained Energy Harvesting Scenarios using IoT Nonvolatile Processors.
IEEE Micro, 2018

Data access skipping for recursive partitioning methods.
Computer Languages, Systems & Structures, 2018

SimpleSSD: Modeling Solid State Drives for Holistic System Simulation.
Computer Architecture Letters, 2018

Enhancing computation-to-core assignment with physical location information.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

CachedGC: Cache-Assisted Garbage Collection in Modern Solid State Drives.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

Quantifying and Optimizing Data Access Parallelism on Manycores.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

Tolerating Write Disturbance Errors in PCM: Experimental Characterization, Analysis, and Mechanisms.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

Content Popularity-Based Selective Replication for Read Redirection in SSDs.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures.
Proceedings of the 19th International Symposium on Quality Electronic Design, 2018

Hybrid-comp: A criticality-aware compressed last-level cache.
Proceedings of the 19th International Symposium on Quality Electronic Design, 2018

Parallelizing garbage collection with I/O to improve flash resource utilization.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density SSDs.
Proceedings of the 16th USENIX Conference on File and Storage Technologies, 2018

Soft Error Characterization on Scientific Applications.
Proceedings of the 2018 IEEE 16th Intl Conf on Dependable, 2018

FLOSS: FLOw sensitive scheduling on mobile platforms.
Proceedings of the 55th Annual Design Automation Conference, 2018

The Curious Case of Container Orchestration and Scheduling in GPU-based Datacenters.
Proceedings of the ACM Symposium on Cloud Computing, 2018

NEOFog: Nonvolatility-Exploiting Optimizations for Fog Computing.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
HL-PCM: MLC PCM Main Memory with Accelerated Read.
IEEE Trans. Parallel Distrib. Syst., 2017

Cache Hierarchy-Aware Query Mapping on Emerging Multicore Architectures.
IEEE Trans. Computers, 2017

A selective protection scheme of applications using asymmetrically reliable caches.
Journal of Systems Architecture - Embedded Systems Design, 2017

Optimizing energy consumption in GPUS through feedback-driven CTA scheduling.
Proceedings of the 25th High Performance Computing Symposium, Virginia Beach, VA, USA, April 23, 2017

A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems.
Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, Urbana-Champaign, IL, USA, June 05, 2017

Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory.
Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, Urbana-Champaign, IL, USA, June 05, 2017

Compiler-Enhanced Reliability for Network-on-Chip Architectures.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

Race-to-sleep + content caching + display caching: a recipe for energy-efficient video streaming on handhelds.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Data movement aware computation partitioning.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Incidental computing on IoT nonvolatile processors.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

REMAP: a reliability/endurance mechanism for advancing PCM.
Proceedings of the International Symposium on Memory Systems, 2017

DEMM: A Dynamic Energy-Saving Mechanism for Multicore Memories.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

Quantifying the Potential Benefits of On-chip Near-Data Computing in Manycore Processors.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

Characterizing diverse handheld apps for customized hardware acceleration.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Exploring the impact of memory block permutation on performance of a crossbar ReRAM main memory.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

TraceTracker: Hardware/software co-evaluation for large-scale I/O workload reconstruction.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Congestion-aware memory management on NUMA platforms: A VMware ESXi case study.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Phoenix: A Constraint-Aware Scheduler for Heterogeneous Datacenters.
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, 2017

A Scale-Out Enterprise Storage Architecture.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Leveraging value locality for efficient design of a hybrid cache in multicore processors.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

Controlled Kernel Launch for Dynamic Parallelism in GPUs.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Exploring the Potential for Collaborative Data Compression and Hard-Error Tolerance in PCM Memories.
Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2017

Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Exploiting Intra-Request Slack to Improve SSD Performance.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

POSTER: Location-Aware Computation Mapping for Manycore Processors.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
NANDFlashSim: High-Fidelity, Microarchitecture-Aware NAND Flash Memory Simulation.
TOS, 2016

Memory Partitioning in the Limit.
International Journal of Parallel Programming, 2016

Asymmetrically reliable caches for multicore architectures under performance and energy constraints.
Cluster Computing, 2016

Exploiting Core Criticality for Enhanced GPU Performance.
Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016

Exploring the potentials of parallel garbage collection in SSDs for enterprise storage systems.
Proceedings of the International Conference for High Performance Computing, 2016

An in-depth study of next generation interface for emerging non-volatile memories.
Proceedings of the 5th Non-Volatile Memory Systems and Applications Symposium, 2016

Improving bank-level parallelism for irregular applications.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Storage consolidation: Not always a panacea, but can we ease the pain?
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

MLC PCM main memory with accelerated read.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Boosting Access Parallelism to PCM-Based Main Memory.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Re-NUCA: A Practical NUCA Architecture for ReRAM Based Last-Level Caches.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Cache-Aware Approximate Computing for Decision Tree Learning.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

HCW 2016 Keynote Talk.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Trace-based affine reconstruction of codes.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

Protecting Code Regions on Asymmetrically Reliable Caches.
Proceedings of the Architecture of Computing Systems - ARCS 2016, 2016

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

μC-States: Fine-grained GPU Datapath Power Management.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
IOPro: a parallel I/O profiling and visualization framework for high-performance storage systems.
The Journal of Supercomputing, 2015

EECache: A Comprehensive Study on the Architectural Design for Energy-Efficient Last-Level Caches in Chip Multiprocessors.
TACO, 2015

Thermal-Aware Application Scheduling on Device-Heterogeneous Embedded Architectures.
Proceedings of the 28th International Conference on VLSI Design, 2015

Memory Row Reuse Distance and its Role in Optimizing Application Performance.
Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2015

Optimizing off-chip accesses in multicores.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Anatomy of GPU Memory System for Multi-Application Execution.
Proceedings of the 2015 International Symposium on Memory Systems, 2015

Performance and energy evaluation of data prefetching on intel Xeon Phi.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

VIP: virtualizing IP chains on handheld platforms.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Performance and Energy Efficient Asymmetrically Reliable Caches for Multicore Architectures.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Evaluating the Combined Impact of Node Architecture and Cloud Workload Characteristics on Network Traffic and Performance/Cost.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

Machine learning techniques for improved data prefetching.
Proceedings of the 5th International Conference on Energy Aware Computing Systems & Applications, 2015

Phase Detection with Hidden Markov Models for DVFS on Many-Core Processors.
Proceedings of the 35th IEEE International Conference on Distributed Computing Systems, 2015

Domain knowledge based energy management in handhelds.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Network footprint reduction through data access and computation placement in NoC-based manycores.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Reactive tiling.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

TaPEr: tackling power emergencies in the dark silicon era by exploiting resource scalability.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures.
Proceedings of the 2015 International Conference on Parallel Architecture and Compilation, 2015

Storage Consolidation on SSDs: Not Always a Panacea, but Can We Ease the Pain?
Proceedings of the 2015 International Conference on Parallel Architecture and Compilation, 2015

Exploiting Staleness for Approximating Loads on CMPs.
Proceedings of the 2015 International Conference on Parallel Architecture and Compilation, 2015

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance.
Proceedings of the 2015 International Conference on Parallel Architecture and Compilation, 2015

2014
Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy.
TACO, 2014

Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique.
Eng. Comput. (Lond.), 2014

GemDroid: a framework to evaluate mobile platforms.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

CApRI: CAche-conscious data reordering for irregular codes.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

Short-Circuiting Memory Traffic in Handheld Platforms.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Managing GPU Concurrency in Heterogeneous Architectures.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Compiler Support for Optimizing Memory Bank-Level Parallelism.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

ZombieNAND: Resurrecting Dead NAND Flash for Improved SSD Longevity.
Proceedings of the IEEE 22nd International Symposium on Modelling, 2014

Quantifying and Optimizing the Impact of Victim Cache Line Selection in Manycore Systems.
Proceedings of the IEEE 22nd International Symposium on Modelling, 2014

EECache: exploiting design choices in energy-efficient last-level caches for chip multiprocessors.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014

HIOS: A host interface I/O scheduler for Solid State Disks.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

A cache topology-aware multi-query scheduler for multicore architectures.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

QoS aware dynamic time-slice tuning.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Will They Blend?: Exploring Big Data Computation Atop Traditional HPC NAS Storage.
Proceedings of the IEEE 34th International Conference on Distributed Computing Systems, 2014

Sprinkler: Maximizing resource utilization in many-chip solid state disks.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications.
Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

Trading cache hit rate for memory performance.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Compiler-Directed Energy Reduction Using Dynamic Voltage Scaling and Voltage Islands for Embedded Systems.
IEEE Trans. Computers, 2013

Steep-Slope Devices: From Dark to Dim Silicon.
IEEE Micro, 2013

Examining Thread Vulnerability analysis using fault-injection.
Proceedings of the 21st IEEE/IFIP International Conference on VLSI and System-on-Chip, 2013

Revisiting widely held SSD expectations and rethinking system-level implications.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2013

Exploring the future of out-of-core computing with compute-local non-volatile memory.
Proceedings of the International Conference for High Performance Computing, 2013

Data layout optimization for GPGPU architectures.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Evaluating STT-RAM as an energy-efficient main memory alternative.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Orchestrated scheduling and prefetching for GPGPUs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Interference Resolver in Shared Storage Systems to Provide Fairness to I/O Intensive Applications.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Disk-Cache and Parallelism Aware I/O Scheduling to Improve Storage System Performance.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Design of a large-scale storage-class RRAM system.
Proceedings of the International Conference on Supercomputing, 2013

Challenges in Getting Flash Drives Closer to CPU.
Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems, 2013

Locality-aware mapping and scheduling for multicores.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

Meeting midway: Improving CMP performance with memory-side prefetching.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Neither more nor less: Optimizing thread-level parallelism for GPGPUs.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Reshaping cache misses to improve row-buffer locality in multicore systems.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Reliability-aware core partitioning in chip multiprocessors.
Journal of Systems Architecture - Embedded Systems Design, 2012

Thread vulnerability in parallel applications.
J. Parallel Distrib. Comput., 2012

Automatic Parallel Code Generation for NUFFT Data Translation on multicores.
Journal of Circuits, Systems, and Computers, 2012

REEact: a customizable virtual execution manager for multicore platforms.
Proceedings of the 8th International Conference on Virtual Execution Environments, 2012

IOPin: Runtime Profiling of Parallel I/O in HPC Systems.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

An Evolutionary Path to Object Storage Access.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Compiler-directed file layout optimization for hierarchical storage systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

A compiler framework for extracting superword level parallelism.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012

Locality-Aware Dynamic Mapping for Multithreaded Applications.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

NANDFlashSim: Intrinsic latency variation aware NAND flash memory system modeling and simulation at microarchitecture level.
Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies, 2012

Taking Garbage Collection Overheads Off the Critical Path in SSDs.
Proceedings of the Middleware 2012, 2012

Addressing End-to-End Memory Access Latency in NoC-Based Multicores.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Design space exploration of workload-specific last-level caches.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

Physically Addressed Queueing (PAQ): Improving parallelism in Solid State Disks.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Software-Directed Data Access Scheduling for Reducing Disk Energy Consumption.
Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012

Improving last level cache locality by integrating loop and data transformations.
Proceedings of the 2012 IEEE/ACM International Conference on Computer-Aided Design, 2012

An Evaluation of Different Page Allocation Strategies on High-Speed SSDs.
Proceedings of the 4th USENIX Workshop on Hot Topics in Storage and File Systems, 2012

Performance-reliability tradeoff analysis for multithreaded applications.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

A hybrid NoC design for cache coherence optimization for chip multiprocessors.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Courteous cache sharing: being nice to others in capacity management.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores.
Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, 2012

Panacea: towards holistic optimization of MapReduce applications.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Reuse distance based performance modeling and workload mapping.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

Improving the performance of k-means clustering through computation skipping and data locality optimizations.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

On Urgency of I/O Operations.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

PEPON: performance-aware hierarchical power budgeting for NoC based multicores.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Application-aware prefetch prioritization in on-chip networks.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Off-chip access localization for NoC-based multicores.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

MROrchestrator: A Fine-Grained Resource Orchestration Framework for MapReduce Clusters.
Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 2012

2011
BrickX: building hybrid systems for recursive computations.
SIGMETRICS Performance Evaluation Review, 2011

Particle simulation on the Cell BE architecture.
Cluster Computing, 2011

Studying inter-core data reuse in multicores.
Proceedings of the SIGMETRICS 2011, 2011

METE: meeting end-to-end QoS in multicores through system-wide resource management.
Proceedings of the SIGMETRICS 2011, 2011

Virtual I/O caching: dynamic storage cache management for concurrent workloads.
Proceedings of the Conference on High Performance Computing Networking, 2011

QoS aware storage cache management in multi-server environments.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

Automatic Feedback Control of Shared Hybrid Caches in 3D Chip Multiprocessors.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011

Quantifying Thread Vulnerability for Multicore Architectures.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011

A data layout optimization framework for NUCA-based multicores.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Reducing memory interference in multicore systems via application-aware memory channel partitioning.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Exploring performance-power tradeoffs in providing reliability for NoC-based MPSoCs.
Proceedings of the 12th International Symposium on Quality Electronic Design, 2011

Minimizing interference through application mapping in multi-level buffer caches.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Improving energy efficiency of multi-threaded applications using heterogeneous CMOS-TFET multicores.
Proceedings of the 2011 International Symposium on Low Power Electronics and Design, 2011

Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines.
Proceedings of the 2011 International Conference on Distributed Computing Systems, 2011

Feedback control based cache reliability enhancement for emerging multicores.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Exploring heterogeneous NoC design space.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Cooperative parallelization.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Improving shared cache behavior of multithreaded object-oriented applications in multicores.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Optimizing data locality using array tiling.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Software-directed data access scheduling for reducing disk energy consumption.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Bandwidth Constrained Coordinated HW/SW Prefetching for Multicores.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Multilayer Cache Partitioning for Multiprogram Workloads.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Process variation-aware routing in NoC based multicores.
Proceedings of the 48th Design Automation Conference, 2011

A helper thread based dynamic cache partitioning scheme for multithreaded applications.
Proceedings of the 48th Design Automation Conference, 2011

On-chip cache hierarchy-aware tile scheduling for multicore machines.
Proceedings of the CGO 2011, 2011

Neighborhood-aware data locality optimization for NoC-based multicores.
Proceedings of the CGO 2011, 2011

Adaptive QoS Decomposition and Control for Storage Cache Management in Multi-server Environments.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

APP: Minimizing Interference Using Aggressive Pipelined Prefetching in Multi-level Buffer Caches.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

Optimizing Data Layouts for Parallel Computation on Multicores.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

Compiler Directed Data Locality Optimization for Multicore Architectures.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Total Power Optimization for Combinational Logic Using Genetic Algorithms.
Signal Processing Systems, 2010

On-chip memory space partitioning for chip multiprocessors using polyhedral algebra.
IET Computers & Digital Techniques, 2010

Exploiting large on-chip memory space through data recomputation.
Proceedings of the Annual IEEE International SoC Conference, SoCC 2010, 2010

Coordinated power management of voltage islands in CMPs.
Proceedings of the SIGMETRICS 2010, 2010

CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors.
Proceedings of the Conference on High Performance Computing Networking, 2010

Automated Tracing of I/O Stack.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Intra-application shared cache partitioning for multithreaded applications.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Cache topology aware computation mapping for multicores.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Compiler directed network-on-chip reliability enhancement for chip multiprocessors.
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, 2010

Intra-application cache partitioning.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

T-NUCA - a novel approach to non-uniform access latency cache architectures for 3D CMPs.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Analyzing the soft error resilience of linear solvers on multicore multiprocessors.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Dynamic core partitioning for energy efficiency.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Adaptive multi-level cache allocation in distributed storage architectures.
Proceedings of the 24th International Conference on Supercomputing, 2010

Cashing in on hints for better prefetching and caching in PVFS and MPI-IO.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

Computation mapping for multi-level storage cache hierarchies.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

SRP: Symbiotic Resource Partitioning of the Memory Hierarchy in CMPs.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Scalable Parallelization Strategies to Accelerate NuFFT Data Translation on Multicores.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Code Scheduling for Optimizing Parallelism and Data Locality.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

A special-purpose compiler for look-up table and code generation for function evaluation.
Proceedings of the Design, Automation and Test in Europe, 2010

Feedback control for providing QoS in NoC based multicores.
Proceedings of the Design, Automation and Test in Europe, 2010

2009
Reducing memory requirements of resource-constrained applications.
ACM Trans. Embedded Comput. Syst., 2009

Compiler-assisted soft error detection under performance and energy constraints in embedded systems.
ACM Trans. Embedded Comput. Syst., 2009

Using Data Compression for Increasing Memory System Utilization.
IEEE Trans. on CAD of Integrated Circuits and Systems, 2009

Process-Variation-Aware Adaptive Cache Architecture and Management.
IEEE Trans. Computers, 2009

An Automated Framework for Accelerating Numerical Algorithms on Reconfigurable Platforms Using Algorithmic/Architectural Optimization.
IEEE Trans. Computers, 2009

Adapting application execution in CMPs using helper threads.
J. Parallel Distrib. Comput., 2009

Shared scratch pad memory space management across applications.
IJES, 2009

Clone Detection in Sensor Networks with Ad Hoc and Grid Topologies.
IJDSN, 2009

A case for integrated processor-cache partitioning in chip multiprocessors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Dynamic storage cache allocation in multi-server architectures.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

A hardware-software codesign strategy for Loop intensive applications.
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

A compiler-directed data prefetching scheme for chip multiprocessors.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

SHARP control: controlled shared cache management in chip multiprocessors.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Optimizing shared cache behavior of chip multiprocessors.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

In-Network Caching for Chip Multiprocessors.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Communication Based Proactive Link Power Management.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Adapting Application Mapping to Systematic Within-Die Process Variations on Chip Multiprocessors.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Hybrid Techniques for Fast Multicore Simulation.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Exploring parallelization strategies for NUFFT data translation.
Proceedings of the 9th ACM & IEEE International conference on Embedded software, 2009

Using dynamic compilation for continuing execution under reduced memory availability.
Proceedings of the Design, Automation and Test in Europe, 2009

Adaptive prefetching for shared cache based chip multiprocessors.
Proceedings of the Design, Automation and Test in Europe, 2009

Process variation aware thread mapping for Chip Multiprocessors.
Proceedings of the Design, Automation and Test in Europe, 2009

Dynamic thread and data mapping for NoC based CMPs.
Proceedings of the 46th Design Automation Conference, 2009

Improving I/O performance using soft-QoS-based dynamic storage cache partitioning.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

MPISec I/O: Providing Data Confidentiality in MPI-I/O.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

Markov Model Based Disk Power Management for Data Intensive Workloads.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

Slicing based code parallelization for minimizing inter-processor communication.
Proceedings of the 2009 International Conference on Compilers, 2009

Topology-Aware I/O Caching for Shared Storage Systems.
Proceedings of the 22nd International Conference on Parallel and Distributed Computing and Communication Systems, 2009

Power Aware Disk Allocation.
Proceedings of the 22nd International Conference on Parallel and Distributed Computing and Communication Systems, 2009

Dynamic Storage Cache Partitioning Using Feedback Control Theory.
Proceedings of the 22nd International Conference on Parallel and Distributed Computing and Communication Systems, 2009

2008
Designing a 3-D FPGA: Switch Box Architecture and Thermal Issues.
IEEE Trans. VLSI Syst., 2008

Compiler-Directed Code Restructuring for Improving Performance of MPSoCs.
IEEE Trans. Parallel Distrib. Syst., 2008

Access pattern-based code compression for memory-constrained systems.
ACM Trans. Design Autom. Electr. Syst., 2008

ILP-Based energy minimization techniques for banked memories.
ACM Trans. Design Autom. Electr. Syst., 2008

Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO.
Operating Systems Review, 2008

Capturing and optimizing the interactions between prefetching and cache line turnoff.
Microprocessors and Microsystems - Embedded Hardware Design, 2008

Graphical Mission Specification and Partitioning for Unmanned Underwater Vehicles.
JSW, 2008

Preface.
Computer Languages, Systems & Structures, 2008

Implementation and evaluation of a migration-based NUCA design for chip multiprocessors.
Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2008

Software-directed combined cpu/link voltage scaling fornoc-based cmps.
Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2008

Prefetch throttling and data pinning for improving performance of shared caches.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

A novel migration-based NUCA design for chip multiprocessors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Enhancing the performance of MPI-IO applications by overlapping I/O, computation and communication.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

A Scratch-Pad Memory Aware Dynamic Loop Scheduling Algorithm.
Proceedings of the 9th International Symposium on Quality of Electronic Design (ISQED 2008), 2008

Evaluating the role of scratchpad memories in chip multiprocessors for sparse matrix computations.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Improving I/O performance through compiler-directed code restructuring and adaptive prefetching.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Managing power, performance and reliability trade-offs.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Towards energy efficient scaling of scientific codes.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A helper thread based EDP reduction scheme for adapting application execution in CMPs.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Ring data location prediction scheme for Non-Uniform Cache Architectures.
Proceedings of the 26th International Conference on Computer Design, 2008

SPM management using Markov chain based data access prediction.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

Integrated code and data placement in two-dimensional mesh based chip multiprocessors.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

Improving I/O Performance of Applications through Compiler-Directed Code Restructuring.
Proceedings of the 6th USENIX Conference on File and Storage Technologies, 2008

Application mapping for chip multiprocessors.
Proceedings of the 45th Design Automation Conference, 2008

Adaptive set pinning: managing shared caches in chip multiprocessors.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

A Systematic Approach to Automatically Generate Multiple Semantically Equivalent Program Versions.
Proceedings of the Reliable Software Technologies, 2008

Profiler and compiler assisted adaptive I/O prefetching for shared storage caches.
Proceedings of the 17th International Conference on Parallel Architecture and Compilation Techniques, 2008

2007
On the Detection of Clones in Sensor Networks Using Random Key Predistribution.
IEEE Trans. Systems, Man, and Cybernetics, Part C, 2007

Compiler-Directed Energy Optimization for Parallel Disk Based Systems.
IEEE Trans. Parallel Distrib. Syst., 2007

Reducing energy consumption of parallel sparse matrix applications through integrated link/CPU voltage scaling.
The Journal of Supercomputing, 2007

A Prefetching Algorithm for Multi-speed Disks.
Trans. HiPEAC, 2007

An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors.
Trans. HiPEAC, 2007

Solving the Register Allocation Problem for Embedded Systems Using a Hybrid Evolutionary Algorithm.
IEEE Trans. Evolutionary Computation, 2007

Reducing Data TLB Power via Compiler-Directed Address Generation.
IEEE Trans. on CAD of Integrated Circuits and Systems, 2007

Design of power-aware FPGA fabrics.
IJES, 2007

Optimising power efficiency in trace cache fetch unit.
IET Computers & Digital Techniques, 2007

Compiler-Directed Code Restructuring for Operating with Compressed Arrays.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

Locality-Aware Distributed Loop Scheduling for Chip Multiprocessors.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

A Process Scheduler-Based Approach to NoC Power Management.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

Enhancing Locality in Two-Dimensional Space through Integrated Computation and Data Mappings.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

Securing Disk-Resident Data through Application Level Encryption.
Proceedings of the Fourth International IEEE Security in Storage Workshop, 2007

Efficient Function Evaluations with Lookup Tables for Structured Matrix Operations.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2007

Modeling and improving data cache reliability.
Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

Profile-driven energy reduction in network-on-chips.
Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007

Compiler-directed application mapping for NoC based chip multiprocessors.
Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007

An ilp based approach to reducing energy consumption in nocbased CMPS.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Phase-aware adaptive hardware selection for power-efficient scientific computations.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Improving disk reuse for reducing power consumption.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Improving MPI Independent Write Performance Using A Two-Stage Write-Behind Buffering Method.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Data locality enhancement for CMPs.
Proceedings of the 2007 International Conference on Computer-Aided Design, 2007

TANOR: A Tool for Accelerating N-Body Simulations on Reconfigurable Platforms.
Proceedings of the FPL 2007, 2007

Performance aware secure code partitioning.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Memory bank aware dynamic loop scheduling.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

A Memory-Conscious Code Parallelization Scheme.
Proceedings of the 44th Design Automation Conference, 2007

Reducing Off-Chip Memory Access Costs Using Data Recomputation in Embedded Chip Multi-processors.
Proceedings of the 44th Design Automation Conference, 2007

Runtime system support for software-guided disk power management.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

Integrated Data Reorganization and Disk Mapping for Reducing Disk Energy Consumption.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms.
Proceedings of the 2007 International Conference on Compilers, 2007

Automated Mission Parallelization for Unmanned Underwater Vehicles.
Proceedings of the Regarding the Intelligence in Distributed Intelligent Systems, 2007

Energy-Optimal Data Collection and Communication Using a Group of UUVs.
Proceedings of the Regarding the Intelligence in Distributed Intelligent Systems, 2007

Reducing Energy Consumption of On-Chip Networks Through a Hybrid Compiler-Runtime Approach.
Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007

Ring Prediction for Non-Uniform Cache Architectures.
Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007

2006
Estimating and reducing the memory requirements of signal processing codes for embedded systems.
IEEE Trans. Signal Processing, 2006

Multicollective I/O: A technique for exploiting inter-file access patterns.
TOS, 2006

Improving the energy behavior of block buffering using compiler optimizations.
ACM Trans. Design Autom. Electr. Syst., 2006

Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality.
ACM Trans. Design Autom. Electr. Syst., 2006

Reducing dynamic and leakage energy in VLIW architectures.
ACM Trans. Embedded Comput. Syst., 2006

Reducing code size through address register assignment.
ACM Trans. Embedded Comput. Syst., 2006

Reducing memory energy consumption of embedded applications that process dynamically allocated data.
IEEE Trans. on CAD of Integrated Circuits and Systems, 2006

The Sleep Deprivation Attack in Sensor Networks: Analysis and Methods of Defense.
IJDSN, 2006

Geometric Tiling for Reducing Power Consumption in Structured Matrix Operations.
Proceedings of the 2006 IEEE International SOC Conference, Austin, Texas, USA, 2006

Energy-Aware Code Replication for Improving Reliability in Embedded Chip Multiprocessors.
Proceedings of the 2006 IEEE International SOC Conference, Austin, Texas, USA, 2006

Compiler Support for Voltage Islands.
Proceedings of the 2006 IEEE International SOC Conference, Austin, Texas, USA, 2006

Compiler-directed channel allocation for saving power in on-chip networks.
Proceedings of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2006

Reducing NoC energy consumption through compiler-directed channel voltage scaling.
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006

Compiler-directed thermal management for VLIW functional units.
Proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Languages, 2006

An Integer Linear Programming Based Approach to Simultaneous Memory Space Partitioning and Data Allocation for Chip Multiprocessors.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Exploiting Software Pipelining for Network-on-Chip architectures.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Reducing Memory Requirements through Task Recomputation in Embedded Multi-CPU Systems.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Leakage-Aware SPM Management.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Compiler-Directed Management of Leakage Power in Software-Managed Memories.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Shared Scratch-Pad Memory Space Management.
Proceedings of the 7th International Symposium on Quality of Electronic Design (ISQED 2006), 2006

Data Replication in Banked DRAMs for Reducing Energy Consumption.
Proceedings of the 7th International Symposium on Quality of Electronic Design (ISQED 2006), 2006

Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs.
Proceedings of the 7th International Symposium on Quality of Electronic Design (ISQED 2006), 2006

Minimizing energy consumption of banked memories using data recomputation.
Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

Reducing power through compiler-directed barrier synchronization elimination.
Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

An ILP Formulation for Task Scheduling on Heterogeneous Chip Multiprocessors.
Proceedings of the Computer and Information Sciences, 2006

Design and Management of 3D Chip Multiprocessors Using Network-in-Memory.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Integrated link/CPU voltage scaling for reducing energy consumption of parallel sparse matrix applications.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Enhancing L2 organization for CMPs with a center cell.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

SPM Conscious Loop Scheduling for Embedded Chip Multiprocessors.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

Multi-Level On-Chip Memory Hierarchy Design for Embedded Chip Multiprocessors.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

Cache miss clustering for banked memory systems.
Proceedings of the 2006 International Conference on Computer-Aided Design, 2006

An ILP based approach to address code generation for digital signal processors.
Proceedings of the 16th ACM Great Lakes Symposium on VLSI 2006, Philadelphia, PA, USA, April 30, 2006

Selective code/data migration for reducing communication energy in embedded MpSoC architectures.
Proceedings of the 16th ACM Great Lakes Symposium on VLSI 2006, Philadelphia, PA, USA, April 30, 2006

Switch Box Architectures for Three-Dimensional FPGAs.
Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006

Memory-Conscious Reliable Execution on Embedded Chip Multiprocessors.
Proceedings of the 2006 International Conference on Dependable Systems and Networks (DSN 2006), 2006

Dynamic partitioning of processing and memory resources in embedded MPSoC architectures.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Activity clustering for leakage management in SPMs.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Dynamic scratch-pad memory management for irregular array access patterns.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Optimizing code parallelization through a constraint network based approach.
Proceedings of the 43rd Design Automation Conference, 2006

A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

Energy-aware data prefetching for multi-speed disks.
Proceedings of the Third Conference on Computing Frontiers, 2006

Multi-compilation: capturing interactions among concurrently-executing applications.
Proceedings of the Third Conference on Computing Frontiers, 2006

Using Task Recomputation During Application Mapping in Parallel Embedded Architectures.
Proceedings of the 2006 International Conference on Computer Design & Conference on Computing in Nanotechnology, 2006

Reducing dynamic compilation overhead by overlapping compilation and execution.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Energy savings through embedded processing on disk system.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Optimal topology exploration for application-specific 3D architectures.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Compiler-Guided data compression for reducing memory consumption of embedded applications.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Maximizing data reuse for minimizing memory space requirements and execution cycles.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Prefetching-aware cache line turnoff for saving leakage energy.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Object duplication for improving reliability.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Energy-aware computation duplication for improving reliability in embedded chip multiprocessors.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Secure Execution of Computations in Untrusted Hosts.
Proceedings of the Reliable Software Technologies, 2006

2005
Compiler-guided leakage optimization for banked scratch-pad memories.
IEEE Trans. VLSI Syst., 2005

Soft errors issues in low-power caches.
IEEE Trans. VLSI Syst., 2005

Optimizing Array-Intensive Applications for On-Chip Multiprocessors.
IEEE Trans. Parallel Distrib. Syst., 2005

Optimizing instruction TLB energy using software and hardware techniques.
ACM Trans. Design Autom. Electr. Syst., 2005

Reducing data cache leakage energy using a compiler-based approach.
ACM Trans. Embedded Comput. Syst., 2005

Compiler-directed high-level energy estimation and optimization.
ACM Trans. Embedded Comput. Syst., 2005

Data space-oriented tiling for enhancing locality.
ACM Trans. Embedded Comput. Syst., 2005

Analyzing data reuse for cache reconfiguration.
ACM Trans. Embedded Comput. Syst., 2005

A Holistic Approach to Designing Energy-Efficient Cluster Interconnects.
IEEE Trans. Computers, 2005

Improving whole-program locality using intra-procedural and inter-procedural transformations, .
J. Parallel Distrib. Comput., 2005

An integer linear programming-based tool for wireless sensor networks.
J. Parallel Distrib. Comput., 2005

Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation.
J. Parallel Distrib. Comput., 2005

Symmetric encryption in reconfigurable and custom hardware.
IJES, 2005

Exploiting frequent field values in java objects for reducing heap memory requirements.
Proceedings of the 1st International Conference on Virtual Execution Environments, 2005

Constraint-based Code mapping for heterogeneous Chip multiprocessors.
Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

On-Chip Memory Management for Embedded MpSoC Architectures Based on Data Compression.
Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

Workload Clustering for Increasing Energy Savings on Embedded MPSoCs.
Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

Memory Space Conscious Loop Iteration Duplication for Reliable Execution.
Proceedings of the Static Analysis, 12th International Symposium, 2005

An Adaptive Locality-Conscious Process Scheduler for Embedded Systems.
Proceedings of the 11th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2005), 2005

Exposing disk layout to compiler for reducing energy consumption of parallel disk based systems.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Fault Recovery Designs for Processor-Embedded Distributed Storage Architectures with I/O-Intensive DB Workloads.
Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST 2005), 2005

Compiling for memory emergency.
Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, 2005

Dynamic Compilation for Reducing Energy Consumption of I/O-Intensive Applications.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

A Data-Driven Approach for Embedded Security.
Proceedings of the 2005 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2005), 2005

Increasing Data TLB Resilience to Transient Errors.
Proceedings of the 2005 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2005), 2005

Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi-Processor SoCs.
Proceedings of the 2005 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2005), 2005

An ILP Formulation for Reliability-Oriented High-Level Synthesis.
Proceedings of the 6th International Symposium on Quality of Electronic Design (ISQED 2005), 2005

Reliability-Centric Hardware/Software Co-Design.
Proceedings of the 6th International Symposium on Quality of Electronic Design (ISQED 2005), 2005

Pro-active Page Replacement for Scientific Applications: A Characterization.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Power-aware code scheduling for clusters of active disks.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

An evaluation of code and data optimizations in the context of disk power reduction.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Dataflow analysis for energy-efficient scratch-pad memory management.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Software-Directed Disk Power Management for Scientific Applications.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Exploiting Barriers to Optimize Power Consumption of CMPs.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Power and Performance in I/O for Scientific Applications.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Reducing Power with Performance Constraints for Parallel Sparse Applications.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Reliability-Conscious Process Scheduling under Performance Constraints in FPGA-Based Embedded Systems.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Disk layout optimization for reducing energy consumption.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Temperature-Sensitive Loop Parallelization for Chip Multiprocessors.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Improving scratch-pad memory reliability through compiler-guided data block duplication.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Compiler-directed voltage scaling on communication links for reducing power consumption.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

2D data locality: definition, abstraction, and application.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Integrating loop and data optimizations for locality within a constraint network based framework.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Runtime integrity checking for inter-object connections.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Code restructuring for improving cache performance of MPSoCs.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Using data compression in an MPSoC architecture for improving performance.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

Energy management in software-controlled multi-level memory hierarchies.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

Integer linear programming based energy optimization for banked DRAMs.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

Load elimination for low-power embedded processors.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

Exploiting last idle periods of links for network power management.
Proceedings of the EMSOFT 2005, 2005

Optimizing inter-processor data locality on embedded chip multiprocessors.
Proceedings of the EMSOFT 2005, 2005

A Data-Centric Approach to Checksum Reuse for Array-Intensive Applications.
Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN 2005), 28 June, 2005

Reliability-Centric High-Level Synthesis.
Proceedings of the 2005 Design, 2005

Access Pattern-Based Code Compression for Memory-Constrained Embedded Systems.
Proceedings of the 2005 Design, 2005

BB-GC: Basic-Block Level Garbage Collection.
Proceedings of the 2005 Design, 2005

Nonuniform Banking for Reducing Memory Energy Consumption.
Proceedings of the 2005 Design, 2005

Increasing Register File Immunity to Transient Errors.
Proceedings of the 2005 Design, 2005

Studying Storage-Recomputation Tradeoffs in Memory-Constrained Embedded Processing.
Proceedings of the 2005 Design, 2005

Locality-Aware Process Scheduling for Embedded MPSoCs.
Proceedings of the 2005 Design, 2005

Thermal-Aware Task Allocation and Scheduling for Embedded Systems.
Proceedings of the 2005 Design, 2005

Compiler-Directed Instruction Duplication for Soft Error Detection.
Proceedings of the 2005 Design, 2005

A Constraint Network Based Approach to Memory Layout Optimization.
Proceedings of the 2005 Design, 2005

Locality-conscious workload assignment for array-based computations in MPSOC architectures.
Proceedings of the 42nd Design Automation Conference, 2005

Improving java virtual machine reliability for memory-constrained embedded systems.
Proceedings of the 42nd Design Automation Conference, 2005

Increasing on-chip memory space utilization for embedded chip multiprocessors through data compression.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

Optimizing Address Code Generation for Array-Intensive DSP Applications.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

A Compiler-Based Approach to Data Security.
Proceedings of the Compiler Construction, 14th International Conference, 2005

Compiler-directed proactive power management for networks.
Proceedings of the 2005 International Conference on Compilers, 2005

Verifiable annotations for embedded java environments.
Proceedings of the 2005 International Conference on Compilers, 2005

Customized on-chip memories for embedded chip multiprocessors.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Using loop invariants to fight soft errors in data caches.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Using data replication to reduce communication energy on chip multiprocessors.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Studying interactions between prefetching and cache line turnoff.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

FD-HGAC: a hybrid heuristic/genetic algorithm hardware/software co-synthesis framework with fault detection.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Increasing FPGA resilience against soft errors using task duplication.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Compiler-directed selective data protection against soft errors.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Optimizing embedded applications using programmer-inserted hints.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

2004
Instruction Scheduling for Low Power.
VLSI Signal Processing, 2004

Compiler-directed scratch pad memory optimization for embedded multiprocessors.
IEEE Trans. VLSI Syst., 2004

Quasidynamic Layout Optimizations for Improving Data Locality.
IEEE Trans. Parallel Distrib. Syst., 2004

Access Pattern Restructuring for Memory Energy.
IEEE Trans. Parallel Distrib. Syst., 2004

Studying Energy Trade Offs in Offloading Computation/Compilation in Java-Enabled Mobile Devices.
IEEE Trans. Parallel Distrib. Syst., 2004

A compiler-based approach for dynamically managing scratch-pad memories in embedded systems.
IEEE Trans. on CAD of Integrated Circuits and Systems, 2004

Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications.
IEEE Trans. Computers, 2004

Reducing instruction cache energy consumption using a compiler-based strategy.
TACO, 2004

Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation.
J. Parallel Distrib. Comput., 2004

Optimizing Leakage Energy Consumption in Cache Bitlines.
Design Autom. for Emb. Sys., 2004

On the Performance of the POSIX I/O Interface to PVFS.
Proceedings of the 12th Euromicro Workshop on Parallel, 2004

Optimizing Bus Energy Consumption of On-Chip Multiprocessors Using Frequent Values.
Proceedings of the 12th Euromicro Workshop on Parallel, 2004

Code protection for resource-constrained embedded devices.
Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, 2004

An ILP-Based Approach to Locality Optimization.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Field level analysis for heap space optimization in embedded java environments.
Proceedings of the 4th International Symposium on Memory Management, 2004

Fault Tolerant Algorithms for Network-On-Chip Interconnect.
Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

Compiler-directed physical address generation for reducing dTLB power.
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Soft error and energy consumption interactions: a data cache perspective.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

A Parallel Architecture for Secure FPGA Symmetric Encryption.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Improving Performance of Java Applications Using a Coprocessor.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Improving Memory Performance of Embedded Java Applications by Dynamic Layout Modifications.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Improving Java Performance Using Dynamic Method Migration on FPGAs.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Exploiting Memory Bank Locality in Multiprocessor SoC Architectures.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

A Window-Based Approach to Retrieving Memory-Resident Data for Query Execution.
Proceedings of the 8th International Database Engineering and Applications Symposium (IDEAS 2004), 2004

Improving soft-error tolerance of FPGA configuration bits.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Banked scratch-pad memory management for reducing leakage energy consumption.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Organizing the Last Line of Defense before Hitting the Memory Wall for CMP.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

Tuning data replication for improving behavior of MPSoC applications.
Proceedings of the 14th ACM Great Lakes Symposium on VLSI 2004, 2004

A Dual-VDD Low Power FPGA Architecture.
Proceedings of the Field Programmable Logic and Application, 2004

Reducing leakage energy in FPGAs using region-constrained placement.
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

A Hybrid Evolutionary Algorithm for Solving the Register Allocation Problem.
Proceedings of the Evolutionary Computation in Combinatorial Optimization, 2004

Exploring the Possibility of Operating in the Compressed Domain.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Using Data Compression to Increase Energy Savings in Multi-bank Memories.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Compiler-Guided Code Restructuring for Improving Instruction TLB Energy Behavior.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Data Windows: A Data-Centric Approach for Query Execution in Memory-Resident Databases.
Proceedings of the 2004 Design, 2004

A Crosstalk Aware Interconnect with Variable Cycle Transmission.
Proceedings of the 2004 Design, 2004

Impact of Data Transformations on Memory Bank Locality.
Proceedings of the 2004 Design, 2004

Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors.
Proceedings of the 2004 Design, 2004

Tuning In-Sensor Data Filtering to Reduce Energy Consumption in Wireless Sensor Networks.
Proceedings of the 2004 Design, 2004

Scheduling Reusable Instructions for Power Reduction.
Proceedings of the 2004 Design, 2004

Configuration-Sensitive Process Scheduling for FPGA-Based Computing Platforms.
Proceedings of the 2004 Design, 2004

Data compression for improving SPM behavior.
Proceedings of the 41th Design Automation Conference, 2004

LODS: locality-oriented dynamic scheduling for on-chip multiprocessors.
Proceedings of the 41th Design Automation Conference, 2004

Compiler-directed code restructuring for reducing data TLB energy.
Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004

Analyzing heap error behavior in embedded JVM environments.
Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004

Energy management schemes for memory-resident database systems.
Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, 2004

Reducing energy consumption of queries in memory-resident database systems.
Proceedings of the 2004 International Conference on Compilers, 2004

Dynamic on-chip memory management for chip multiprocessors.
Proceedings of the 2004 International Conference on Compilers, 2004

Reliability-Aware Co-Synthesis for Embedded Systems.
Proceedings of the 15th IEEE International Conference on Application-Specific Systems, 2004

2003
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework.
IEEE Trans. Parallel Distrib. Syst., 2003

Partitioned instruction cache architecture for energy efficiency.
ACM Trans. Embedded Comput. Syst., 2003

Evaluating Integrated Hardware-Software Optimizations Using a Unified Energy Estimation Framework.
IEEE Trans. Computers, 2003

Memory system optimization of embedded software.
Proceedings of the IEEE, 2003

Managing Leakage Energy in Cache Hierarchies.
J. Instruction-Level Parallelism, 2003

Leakage Current: Moore's Law Meets Static Power.
IEEE Computer, 2003

Reducing Disk Power Consumption in Servers with DRPM.
IEEE Computer, 2003

Loop Transformations for Reducing Data Space Requirements of Resource-Constrained Applications.
Proceedings of the Static Analysis, 10th International Symposium, 2003

Heap compression for memory-constrained Java environments.
Proceedings of the 2003 ACM SIGPLAN Conference on Object-Oriented Programming Systems, 2003

Adapting instruction level parallelism for optimizing leakage in VLIW architectures.
Proceedings of the 2003 Conference on Languages, 2003

Compiler-Based Code Partitioning for Intelligent Embedded Disk Processing.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Using Dynamic Branch Behavior for Power-Efficient Instruction Fetch.
Proceedings of the 2003 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2003), 2003

Interplay of energy and performance for disk arrays running transaction processing workloads.
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

Energy optimization techniques in cluster interconnects.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Estimating influence of data layout optimizations on SDRAM energy consumption.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Exploiting program hotspots and code sequentiality for instruction cache leakage management.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

DRPM: Dynamic Speed Control for Power Mangagement in Server Class Disks.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Energy and Performance Considerations in Work Partitioning for Mobile Spatial Queries.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Energy-Aware Compilation and Execution in Java-Enabled Mobile Devices.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

A compiler approach for reducing data cache energy.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

Reducing dTLB Energy Through Dynamic Resizing.
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Adapative Error Protection for Energy Efficiency.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

Array Composition and Decomposition for Optimizing Embedded Applications.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

An Energy-Oriented Evaluation of Communication Optimizations for Microcensor Networks.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

Energy-Conscious Memory Allocation and Deallocation for Pointer-Intensive Applications.
Proceedings of the Embedded Software, Third International Conference, 2003

ICR: In-Cache Replication for Enhancing Data Cache Reliability.
Proceedings of the 2003 International Conference on Dependable Systems and Networks (DSN 2003), 2003

CCC: Crossbar Connected Caches for Reducing Energy Consumption of On-Chip Multiprocessors.
Proceedings of the 2003 Euromicro Symposium on Digital Systems Design (DSD 2003), 2003

Compiler-Directed Management of Instruction Accesses.
Proceedings of the 2003 Euromicro Symposium on Digital Systems Design (DSD 2003), 2003

Compiler Support for Reducing Leakage Energy Consumption.
Proceedings of the 2003 Design, 2003

Masking the Energy Behavior of DES Encryption.
Proceedings of the 2003 Design, 2003

An Integrated Approach for Improving Cache Behavior.
Proceedings of the 2003 Design, 2003

Generalized Data Transformations for Enhancing Cache Behavior.
Proceedings of the 2003 Design, 2003

Runtime Code Parallelization for On-Chip Multiprocessors.
Proceedings of the 2003 Design, 2003

Implementation and Evaluation of an On-Demand Parameter-Passing Strategy for Reducing Energy.
Proceedings of the 2003 Design, 2003

Data Space Oriented Scheduling in Embedded Systems.
Proceedings of the 2003 Design, 2003

Interprocedural optimizations for improving data cache performance of array-intensive embedded applications.
Proceedings of the 40th Design Automation Conference, 2003

VL-CDRAM: variable line sized cached DRAMs.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

Tracking object life cycle for leakage energy optimization.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

Discretionary Caching for I/O on Clusters.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

Address Register Assignment for Reducing Code Size.
Proceedings of the Compiler Construction, 12th International Conference, 2003

Performance, energy, and reliability tradeoffs in replicating hot cache lines.
Proceedings of the International Conference on Compilers, 2003

Exploiting bank locality in multi-bank memories.
Proceedings of the International Conference on Compilers, 2003

2002
Energy-performance trade-offs for spatial access methods on memory-resident data.
VLDB J., 2002

An Experimental Evaluation of I/O Optimizations on Different Applications.
IEEE Trans. Parallel Distrib. Syst., 2002

An Experimental Evaluation of I/O Optimizations on Different Applications.
IEEE Trans. Parallel Distrib. Syst., 2002

An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets.
The Journal of Supercomputing, 2002

Tuning garbage collection for reducing memory system energy in an embedded java environment.
ACM Trans. Embedded Comput. Syst., 2002

Using Memory Compression for Energy Reduction in an Embedded Java System.
Journal of Circuits, Systems, and Computers, 2002

Address Code and Arithmetic Optimizations for Embedded Systems.
Proceedings of the ASPDAC 2002 / VLSI Design 2002, 2002

A Heuristic for Clock Selection in High-Level Synthesis.
Proceedings of the ASPDAC 2002 / VLSI Design 2002, 2002

Compiler-Directed Array Interleaving for Reducing Energy in Multi-Bank Memories.
Proceedings of the ASPDAC 2002 / VLSI Design 2002, 2002

Strategies for Improving Data Locality in Embedded Applications.
Proceedings of the ASPDAC 2002 / VLSI Design 2002, 2002

Compiler-directed instruction cache leakage optimization.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Generating physical addresses directly for saving instruction TLB energy.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Power protocol: reducing power dissipation on off-chip data buses.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Energy-conscious compilation based on voltage scaling.
Proceedings of the 2002 Joint Conference on Languages, 2002

Compiler-directed cache polymorphism.
Proceedings of the 2002 Joint Conference on Languages, 2002

A Hybrid Strategy Based on Data Distribution and Migration for Optimizing Memory Locality.
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

Adaptive Garbage Collection for Battery-Operated Environments.
Proceedings of the 2nd Java Virtual Machine Research and Technology Symposium, 2002

Hardware-Software Co-Adaptation for Data-Intensive Embedded Applications.
Proceedings of the 2002 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2002), 2002

Designing Energy-Efficient Software.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Compiler-Directed I/O Optimization.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Dynamic compilation for energy adaptation.
Proceedings of the 2002 IEEE/ACM International Conference on Computer-aided Design, 2002

Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Tuning Garbage Collection in an Embedded Java Environment.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Exploiting Inter-File Access Patterns Using Multi-Collective I/O.
Proceedings of the FAST '02 Conference on File and Storage Technologies, 2002

Data Space Oriented Tiling.
Proceedings of the Programming Languages and Systems, 2002

Enhancing Compiler Techniques for Memory Energy Optimizations.
Proceedings of the Embedded Software, Second International Conference, 2002

Reducing Cache Access Energy in Array-Intensive Application.
Proceedings of the 2002 Design, 2002

A Compiler-Based Approach for Improving Intra-Iteration Data Reuse.
Proceedings of the 2002 Design, 2002

EAC: A Compiler Framework for High-Level Energy Estimation and Optimization.
Proceedings of the 2002 Design, 2002

Power-Efficient Trace Caches.
Proceedings of the 2002 Design, 2002

Automatic data migration for reducing energy consumption in multi-bank memory systems.
Proceedings of the 39th Design Automation Conference, 2002

Exploiting shared scratch pad memory space in embedded multiprocessor systems.
Proceedings of the 39th Design Automation Conference, 2002

Compiler-directed scratch pad memory hierarchy design and management.
Proceedings of the 39th Design Automation Conference, 2002

An integer linear programming based approach for parallelizing applications in On-chip multiprocessors.
Proceedings of the 39th Design Automation Conference, 2002

An energy saving strategy based on adaptive loop parallelization.
Proceedings of the 39th Design Automation Conference, 2002

Scheduler-based DRAM energy management.
Proceedings of the 39th Design Automation Conference, 2002

Locality-conscious process scheduling in embedded systems.
Proceedings of the Tenth International Symposium on Hardware/Software Codesign, 2002

Energy savings through compression in embedded Java environments.
Proceedings of the Tenth International Symposium on Hardware/Software Codesign, 2002

Kernel-Level Caching for Optimizing I/O by Exploiting Inter-Application Data Sharing.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems.
Proceedings of the Compiler Construction, 11th International Conference, 2002

Optimizing inter-nest data locality.
Proceedings of the International Conference on Compilers, 2002

Leakage Energy Management in Cache Hierarchies.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

Compilation for Distributed Memory Architectures.
The Compiler Design Handbook, 2002

2001
Investigating Memory System Energy Behavior Using Software and Hardware Optimizations.
VLSI Design, 2001

Static and Dynamic Locality Optimizations Using Integer Linear Programming.
IEEE Trans. Parallel Distrib. Syst., 2001

Compiler-Directed Collective-I/O.
IEEE Trans. Parallel Distrib. Syst., 2001

A Layout-Conscious Iteration Space Transformation Technique.
IEEE Trans. Computers, 2001

Hardware and Software Techniques for Controlling DRAM Power Modes.
IEEE Trans. Computers, 2001

Design and Evaluation of a Smart Disk Cluster for DSS Commercial Workloads.
J. Parallel Distrib. Comput., 2001

Efficient Synthesis of Array Intensive Computations onto FPGA Based Accelerators.
Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001

Formulation and Validation of an Energy Dissipation Model for the Clock Generation Circuitry and Distribution Networks.
Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001

Analyzing energy behavior of spatial access methods for memory-resident data.
Proceedings of the VLDB 2001, 2001

A dynamic locality optimization algorithm for linear algebra codes.
Proceedings of the 2001 ACM Symposium on Applied Computing (SAC), 2001

A compiler technique for improving whole-program locality.
Proceedings of the Conference Record of POPL 2001: The 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2001

vEC: virtual energy counters.
Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, 2001

Exploiting VLIW schedule slacks for dynamic and leakage energy reduction.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Morphable Cache Architectures: Potential Benefits.
Proceedings of The Workshop on Languages, 2001

Improving Off-Chip Memory Energy Behavior in a Multi-processor, Multi-bank Environment.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

Energy Behavior of Java Applications from the Memory Perspective.
Proceedings of the 1st Java Virtual Machine Research and Technology Symposium, 2001

Exploiting scratch-pad memory using Presburger formulas.
Proceedings of the 14th International Symposium on Systems Synthesis, 2001

Power-aware partitioned cache architectures.
Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001

Compiler support for block buffering.
Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001

Influence of Array Allocation Mechanisms on Memory System Energy.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Use of Local Memory for Efficient Java Execution.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

A Framework for Energy Estimation of VLIW Architecture.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Improving Memory Energy Using Access Pattern Classification.
Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design, 2001

DRAM Energy Management Using Software and Hardware Directed Power Mode Control.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Reducing Memory Requirements of Nested Loops for Embedded Systems.
Proceedings of the 38th Design Automation Conference, 2001

Dynamic Management of Scratch-Pad Memory Space.
Proceedings of the 38th Design Automation Conference, 2001

Compiler-directed selection of dynamic memory layouts.
Proceedings of the Ninth International Symposium on Hardware/Software Codesign, 2001

Array Unification: A Locality Optimization Technique.
Proceedings of the Compiler Construction, 10th International Conference, 2001

Energy-efficient instruction cache using page-based placement.
Proceedings of the 2001 International Conference on Compilers, 2001

2000
A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations.
IEEE Trans. Parallel Distrib. Syst., 2000

Minimizing Data and Synchronization Costs in One-Way Communication.
IEEE Trans. Parallel Distrib. Syst., 2000

Data management for large-scale scientific computations in high performance distributed systems.
Cluster Computing, 2000

A Holistic Approach to System Level Energy Optimization.
Proceedings of the Integrated Circuit Design, 2000

APRIL: A Run-Time Library for Tape-Resident Data.
Proceedings of the Eighth NASA Goddard Space Flight Center Conference on Mass Storage Systems and Technologies in cooperation with Seventeenth IEEE Symposium on Mass Storage Systems, 2000

Towards Energy-Aware Iteration Space Tiling.
Proceedings of the Languages, 2000

A Collective I/O Scheme Based on Compiler Analysis.
Proceedings of the Languages, 2000

Experimental Evaluation of Energy Behavior of Iteration Space Tiling.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Improving Offset Assignment for Embedded Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Memory system energy (poster session): influence of hardware-software optimizations.
Proceedings of the 2000 International Symposium on Low Power Electronics and Design, 2000

Energy-driven integrated hardware-software optimizations using SimplePower.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

A novel application development environment for large-scale scientific computations.
Proceedings of the 14th international conference on Supercomputing, 2000

Design and Evaluation of Smart Disk Architecture for DSS Commercial Workloads.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

Energy-Aware Instruction Scheduling.
Proceedings of the High Performance Computing, 2000

Improving Offset Assignment on Embedded Processors Using Transformations.
Proceedings of the High Performance Computing, 2000

Design and Evaluation of a Compiler-Directed Collective I/O Technique.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

The design and use of simplepower: a cycle-accurate energy estimation tool.
Proceedings of the 37th Conference on Design Automation, 2000

Influence of compiler optimizations on system power.
Proceedings of the 37th Conference on Design Automation, 2000

Energy-oriented compiler optimizations for partitioned memory architectures.
Proceedings of the 2000 International Conference on Compilers, 2000

Data Relation Vectors: A New Abstraction for Data Optimizations.
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts.
IEEE Trans. Parallel Distrib. Syst., 1999

A global communication optimization technique based on data-flow analysis and linear algebra.
ACM Trans. Program. Lang. Syst., 1999

Improving Cache Locality by a Combination of Loop and Data Transformation.
IEEE Trans. Computers, 1999

A Matrix-Based Approach to Global Locality Optimization.
J. Parallel Distrib. Comput., 1999

Improving Locality Using a Graph-Based Technique for Detecting Memory Layouts of Arrays.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

An integer linear programming approach for optimizing cache locality.
Proceedings of the 13th international conference on Supercomputing, 1999

A Framework for Interprocedural Locality Optimization Using Both Loop and Data Layout Transformations.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Compiler Optimizations for I/O-Intensive Computations.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems.
Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing, 1999

Restructuring I/O-Intensive Computations for Locality.
Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

I/O-Conscious Tiling for Disk-Resident Data Sets.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

On Reducing False Sharing while Improving Locality on Shared Memory Multiprocessors.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998
Compilation Techniques for Out-of-Core Parallel Computations.
Parallel Computing, 1998

Locality Optimization Algorithms for Compilation of Out-of-Core Codes.
J. Inf. Sci. Eng., 1998

An Experimental Study to Analyze and Optimize Hartree-Fock Application's I/O with Passion.
IJHPCA, 1998

Improving Locality Using Loop and Data Transformations in an Integrated Framework.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Improving Locality in Out-of-Core Computations Using Data Layout Transformations.
Proceedings of the Languages, 1998

A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality.
Proceedings of the Languages and Compilers for Parallel Computing, 1998

A Generalized Framework for Global Communication Optimization.
IPPS/SPDP, 1998

A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests.
Proceedings of the 12th international conference on Supercomputing, 1998

Minimizing Data and Synchronization Costs in One-Way Communication.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Performance Implications of Architectural and Software Techniques on I/O-Intensive Applications.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Enhancing Spatial Locality via Data Layout Optimizations.
Proceedings of the Euro-Par '98 Parallel Processing, 1998

A Matrix-Based Approach to the Global Locality Optimization Problem.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997
Changing Interaction of Compiler and Architecture.
IEEE Computer, 1997

Optimization and Evaluation of Hartree-Fock Application's I/O with PASSION.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

I/O Optimizations for Compiling Out-of Core Programs on Distributed-Memory Machines.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

Data Access Reorganizations in Compiling Out-of-Core Data Parallel Programs on Distributed Memory Machines.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

A Unified Compiler Algorithm for Optimizing Locality, Parallelism and Communication in Out-of-core Computations.
IOPADS, 1997

A Compiler Algorithm for Optimizing Locality in Loop Nests.
Proceedings of the 11th international conference on Supercomputing, 1997

Improving the Performance of Out-of-Core Computations.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

Global I/O optimizations for out-of-core computations.
Proceedings of the Fourth International on High-Performance Computing, 1997

Optimization of Out-of-Core Computations Using Chain Vectors.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed Memory Machines.
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997


  Loading...