Donald Yeung

Affiliations:
  • University of Maryland, College Park, USA


According to our database1, Donald Yeung authored at least 52 papers between 1992 and 2021.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2021
Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache.
ACM Trans. Archit. Code Optim., 2021

2020
Nested MIMD-SIMD Parallelization for Heterogeneous Microprocessors.
ACM Trans. Archit. Code Optim., 2020

Tileable Monolithic ReRAM Memory Design.
Proceedings of the 2020 IEEE Symposium in Low-Power and High-Speed Chips, 2020

2019
Analyzing the Monolithic Integration of a ReRAM-Based Main Memory Into a CPU's Die.
IEEE Micro, 2019

Design for ReRAM-based main-memory architectures.
Proceedings of the International Symposium on Memory Systems, 2019

2018
Memory-systems challenges in realizing monolithic computers.
Proceedings of the International Symposium on Memory Systems, 2018

2017
Using Multicore Reuse Distance to Study Coherence Directories.
ACM Trans. Comput. Syst., 2017

Multi-cache resizing via greedy coordinate descent.
J. Supercomput., 2017

Guiding Locality Optimizations for Graph Computations via Reuse Distance Analysis.
IEEE Comput. Archit. Lett., 2017

Optimizing locality in graph computations using reuse distance profiles.
Proceedings of the 36th IEEE International Performance Computing and Communications Conference, 2017

2016
Unlocking the True Potential of 3-D CPUs With Microfluidic Cooling.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Identifying Power-Efficient Multicore Cache Hierarchies via Reuse Distance Analysis.
ACM Trans. Comput. Syst., 2016

Reducing data movement with approximate computing techniques.
Proceedings of the IEEE International Conference on Rebooting Computing, 2016

2015
Studying the impact of multicore processor scaling on directory techniques via reuse distance analysis.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014
Unlocking the true potential of 3D CPUs with micro-fluidic cooling.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014

2013
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs.
ACM Trans. Comput. Syst., 2013

Studying multicore processor scaling via reuse distance analysis.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

High performance 3D stacked DRAM processor architectures with micro-fluidic cooling.
Proceedings of the 2013 IEEE International 3D Systems Integration Conference (3DIC), 2013

2012
Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis.
Proceedings of the 2012 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '12, 2012

2011
Experience with Improving Distributed Shared Cache Performance on Tilera's Tile Processor.
IEEE Comput. Archit. Lett., 2011

Multicore Performance Optimization Using Partner Cores.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011

Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2009
Hill-climbing SMT processor resource distribution.
ACM Trans. Comput. Syst., 2009

Enhancing LTP-Driven Cache Management Using Reuse Distance Information.
J. Instr. Level Parallelism, 2009

Using Aggressor Thread Information to Improve Shared Cache Management for CMPs.
Proceedings of the PACT 2009, 2009

2008
Exploiting Application-Level Correctness for Low-Cost Fault Tolerance.
J. Instr. Level Parallelism, 2008

2007
Low Power System Design by Combining Software Prefetching and Dynamic voltage Scaling.
J. Circuits Syst. Comput., 2007

Application-Level Correctness and its Impact on Fault Tolerance.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

2006
Learning-Based SMT Processor Resource Distribution via Hill-Climbing.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

2005
BioBench: A Benchmark Suite of Bioinformatics Applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

2004
A study of source-level compiler algorithms for automatic construction of pre-execution code.
ACM Trans. Comput. Syst., 2004

A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching.
ACM Trans. Comput. Syst., 2004

The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems.
J. Instr. Level Parallelism, 2004

Transferring performance gain from software prefetching to energy reduction.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

2003
Optimizing SMT Processors for High Single-Thread Performance.
J. Instr. Level Parallelism, 2003

2002
Design and evaluation of compiler algorithms for pre-execution.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001
SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures.
IEEE Trans. Parallel Distributed Syst., 2001

Evaluating the impact of memory system performance on software prefetching and locality optimizations.
Proceedings of the 15th international conference on Supercomputing, 2001

Multi-Chain Prefetching: Effective Exploitation of Inter-Chain Memory Parallelism for Pointer-Chasing Codes.
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000
Multigrain shared memory.
ACM Trans. Comput. Syst., 2000

1999
The MIT Alewife Machine.
Proc. IEEE, 1999

The scalability of multigrain systems.
Proceedings of the 13th international conference on Supercomputing, 1999

1998
Multigrain shared memory.
PhD thesis, 1998

Exploring Optimal Cost-Performance Designs for Raw Microprocessors.
Proceedings of the 6th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '98), 1998

1996
MGS: A Multigrain Shared Memory System.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

1995
The MIT Alewife Machine: Architecture and Performance.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

1994
Low-Cost Support for Fine-Grain Synchronization in Multiprocessors.
Proceedings of the Multithreaded Computer Architecture, 1994

1993
Sparcle: an evolutionary processor design for large-scale multiprocessors.
IEEE Micro, 1993

Experience with Fine-Grain Synchronization in MIMD Machines for Preconditioned Conjugate Gradient.
Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1993

1992
Sparcle: A Multithreaded VLSI Processor for Parallel Computing.
Proceedings of the Parallel Symbolic Computing: Languages, 1992


  Loading...