Donald Yeung

ACM Trans. Archit. Code Optim., 2021

2020

Nested MIMD-SIMD Parallelization for Heterogeneous Microprocessors.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2020

Tileable Monolithic ReRAM Memory Design.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE Symposium in Low-Power and High-Speed Chips, 2020

2019

Analyzing the Monolithic Integration of a ReRAM-Based Main Memory Into a CPU's Die.

[BibT_eX]

[DOI]

IEEE Micro, 2019

Design for ReRAM-based main-memory architectures.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Memory Systems, 2019

2018

Memory-systems challenges in realizing monolithic computers.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Memory Systems, 2018

2017

Using Multicore Reuse Distance to Study Coherence Directories.

[BibT_eX]

[DOI]

Minshu Zhao

ACM Trans. Comput. Syst., 2017

Multi-cache resizing via greedy coordinate descent.

[BibT_eX]

[DOI]

I. Stephen Choi

J. Supercomput., 2017

Guiding Locality Optimizations for Graph Computations via Reuse Distance Analysis.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2017

Optimizing locality in graph computations using reuse distance profiles.

[BibT_eX]

[DOI]

Proceedings of the 36th IEEE International Performance Computing and Communications Conference, 2017

2016

Unlocking the True Potential of 3-D CPUs With Microfluidic Cooling.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

Identifying Power-Efficient Multicore Cache Hierarchies via Reuse Distance Analysis.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2016

Reducing data movement with approximate computing techniques.

[BibT_eX]

[DOI]

Stephen P. Crago

Proceedings of the IEEE International Conference on Rebooting Computing, 2016

2015

Studying the impact of multicore processor scaling on directory techniques via reuse distance analysis.

[BibT_eX]

[DOI]

Minshu Zhao

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014

Unlocking the true potential of 3D CPUs with micro-fluidic cooling.

[BibT_eX]

[DOI]

Caleb Serafy

Ankur Srivastava

Proceedings of the International Symposium on Low Power Electronics and Design, 2014

2013

Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2013

Studying multicore processor scaling via reuse distance analysis.

[BibT_eX]

[DOI]

Minshu Zhao

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

High performance 3D stacked DRAM processor architectures with micro-fluidic cooling.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International 3D Systems Integration Conference (3DIC), 2013

2012

Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis.

[BibT_eX]

[DOI]

Proceedings of the 2012 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '12, 2012

2011

Experience with Improving Distributed Shared Cache Performance on Tilera's Tile Processor.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2011

Multicore Performance Optimization Using Partner Cores.

[BibT_eX]

[DOI]

Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011

Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2009

Hill-climbing SMT processor resource distribution.

[BibT_eX]

[DOI]

Seungryul Choi

ACM Trans. Comput. Syst., 2009

Enhancing LTP-Driven Cache Management Using Reuse Distance Information.

[BibT_eX]

[DOI]

Wanli Liu

J. Instr. Level Parallelism, 2009

Using Aggressor Thread Information to Improve Shared Cache Management for CMPs.

[BibT_eX]

[DOI]

Wanli Liu

Proceedings of the PACT 2009, 2009

2008

Exploiting Application-Level Correctness for Low-Cost Fault Tolerance.

[BibT_eX]

[DOI]

Xuanhua Li

J. Instr. Level Parallelism, 2008

2007

Low Power System Design by Combining Software Prefetching and Dynamic voltage Scaling.

[BibT_eX]

[DOI]

Sumitkumar N. Pamnani

Deepak N. Agarwal

Gang Qu

J. Circuits Syst. Comput., 2007

Application-Level Correctness and its Impact on Fault Tolerance.

[BibT_eX]

[DOI]

Xuanhua Li

Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

2006

Learning-Based SMT Processor Resource Distribution via Hill-Climbing.

[BibT_eX]

[DOI]

Seungryul Choi

Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

2005

BioBench: A Benchmark Suite of Bioinformatics Applications.

[BibT_eX]

[DOI]

Kursad Albayraktaroglu

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

2004

A study of source-level compiler algorithms for automatic construction of pre-execution code.

[BibT_eX]

[DOI]

Dongkeun Kim

ACM Trans. Comput. Syst., 2004

A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2004

The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems.

[BibT_eX]

[DOI]

Aneesh Aggarwal

Chau-Wen Tseng

J. Instr. Level Parallelism, 2004

Transferring performance gain from software prefetching to energy reduction.

[BibT_eX]

Deepak N. Agarwal

Sumitkumar N. Pamnani

Gang Qu

Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors.

[BibT_eX]

[DOI]

Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

2003

Optimizing SMT Processors for High Single-Thread Performance.

[BibT_eX]

[DOI]

Gautham Thambidorai

Seungryul Choi

J. Instr. Level Parallelism, 2003

2002

Design and evaluation of compiler algorithms for pre-execution.

[BibT_eX]

[DOI]

Dongkeun Kim

Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance.

[BibT_eX]

[DOI]

Gautham K. Dorai

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001

SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures.

[BibT_eX]

[DOI]

Csaba Andras Moritz

IEEE Trans. Parallel Distributed Syst., 2001

Evaluating the impact of memory system performance on software prefetching and locality optimizations.

[BibT_eX]

[DOI]

Aneesh Aggarwal

Chau-Wen Tseng

Proceedings of the 15th international conference on Supercomputing, 2001

Multi-Chain Prefetching: Effective Exploitation of Inter-Chain Memory Parallelism for Pointer-Chasing Codes.

[BibT_eX]

[DOI]

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000

Multigrain shared memory.

[BibT_eX]

[DOI]

John Kubiatowicz

ACM Trans. Comput. Syst., 2000

1999

The MIT Alewife Machine.

[BibT_eX]

[DOI]

Proc. IEEE, 1999

The scalability of multigrain systems.

[BibT_eX]

[DOI]

Proceedings of the 13th international conference on Supercomputing, 1999

1998

Multigrain shared memory.

[BibT_eX]

[DOI]

PhD thesis, 1998

Exploring Optimal Cost-Performance Designs for Raw Microprocessors.

[BibT_eX]

[DOI]

Csaba Andras Moritz

Proceedings of the 6th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '98), 1998

1996

MGS: A Multigrain Shared Memory System.

[BibT_eX]

[DOI]

John Kubiatowicz

Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

1995

The MIT Alewife Machine: Architecture and Performance.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

1994

Low-Cost Support for Fine-Grain Synchronization in Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Multithreaded Computer Architecture, 1994

1993

Sparcle: an evolutionary processor design for large-scale multiprocessors.

[BibT_eX]

[DOI]

IEEE Micro, 1993

Experience with Fine-Grain Synchronization in MIMD Machines for Preconditioned Conjugate Gradient.

[BibT_eX]

[DOI]