Jih-Kwon Peir

Proceedings of the 18th International Conference on Parallel and Distributed Computing, 2017

Content-Aware Non-Volatile Cache Replacement.

[BibT_eX]

[DOI]

Qi Zeng

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Hot Row Identification of DRAM Memory in a Multicore System.

[BibT_eX]

[DOI]

Xi Tao

Qi Zeng

Proceedings of the International Conference on High Performance Compilation, 2017

2016

Runahead Cache Misses Using Bloom Filter.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Parallel and Distributed Computing, 2016

Small cache lookaside table for fast DRAM cache access.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Performance Computing and Communications Conference, 2016

2014

Author retrospective for bloom filtering cache misses for accurate data speculation and prefetching.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Directory Lookaside Table: Enabling scalable, low-conflict, many-core cache coherence directory.

[BibT_eX]

[DOI]

Xudong Shi

Feiqi Su

Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2013

Guided Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory Latency.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Guided multiple hashing: Achieving near perfect balance for fast routing lookup.

[BibT_eX]

[DOI]

Proceedings of the 2013 21st IEEE International Conference on Network Protocols, 2013

2012

Miss-Correlation Folding: Encoding Per-Block Miss Correlations in Compressed DRAM for Data Prefetching.

[BibT_eX]

[DOI]

Gang Liu

Victor W. Lee

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011

Fit a Compact Spread Estimator in Small High-Speed Memory.

[BibT_eX]

[DOI]

IEEE/ACM Trans. Netw., 2011

Enhancements for Accurate and Timely Streaming Prefetcher.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2011

Approximately-perfect hashing: Improving network throughput through efficient off-chip routing table lookup.

[BibT_eX]

[DOI]

Zhuo Huang

Shigang Chen

Proceedings of the INFOCOM 2011. 30th IEEE International Conference on Computer Communications, 2011

Architecture comparisons between Nvidia and ATI GPUs: Computation parallelism and data communications.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Tree structured analysis on GPU power study.

[BibT_eX]

[DOI]

Proceedings of the IEEE 29th International Conference on Computer Design, 2011

Statistical GPU power analysis using tree-based methods.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Green Computing Conference and Workshops, 2011

2010

Weak execution ordering - exploiting iterative methods on many-core GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

Semantics-Aware, Timely Prefetching of Linked Data Structure.

[BibT_eX]

[DOI]

Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

Fast routing table lookup based on deterministic multi-hashing.

[BibT_eX]

[DOI]

S. M. Iftekharul Alam

Proceedings of the 18th annual IEEE International Conference on Network Protocols, 2010

2009

Modeling and Stack Simulation of CMP Cache Capacity and Accessibility.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2009

Greedy Prefix Cache for IP Routing Lookups.

[BibT_eX]

[DOI]

Zhuo Huang

Gang Liu

Proceedings of the 10th International Symposium on Pervasive Systems, 2009

Fit a Spread Estimator in Small Memory.

[BibT_eX]

[DOI]

Proceedings of the INFOCOM 2009. 28th IEEE International Conference on Computer Communications, 2009

2008

Memory hierarchy performance measurement of commercial dual-core desktop processors.

[BibT_eX]

[DOI]

J. Syst. Archit., 2008

2007

CMP cache performance projection: accessibility vs. capacity.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2007

Modeling and Single-Pass Simulation of CMP Cache Capacity and Accessibility.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Memory Performance and Scalability of Intel's and AMD's Dual-Core Processors: A Case Study.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Performance Computing and Communications Conference, 2007

Comparative evaluation of multi-core cache occupancy strategies.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Parallel and Distributed Systems, 2007

2006

Coterminous locality and coterminous group data prefetching on chip-multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Overlapping dependent loads with addressless preload.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2004

A New Address-Free Memory Hierarchy Layer for Zero-Cycle Load.

[BibT_eX]

[DOI]

Lu Peng

Konrad Lai

J. Instr. Level Parallelism, 2004

Signature Buffer: Bridging Performance Gap between Registers and Caches.

[BibT_eX]

[DOI]

Lu Peng

Konrad Lai

Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

2003

Address-free memory access based on program syntax correlation of loads and stores.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2003

2002

Bloom filtering cache misses for accurate data speculation and prefetching.

[BibT_eX]

[DOI]

Proceedings of the 16th international conference on Supercomputing, 2002

Ditto Processor.

[BibT_eX]

[DOI]

Shih-Chang Lai

Shih-Lien Lu

Proceedings of the 2002 International Conference on Dependable Systems and Networks (DSN 2002), 2002

2001

Direct load: dependence-linked dataflow resolution of load address and cache coordinate.

[BibT_eX]

[DOI]

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Symbolic Cache: Fast Memory Access Based on Program Syntax Correlation of Loads and Stores.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

2000

Improving cache performance with Full-Map Block Directory.

[BibT_eX]

[DOI]

J. Syst. Archit., 2000

1999

Functional Implementation Techniques for CPU Cache Memories.

[BibT_eX]

[DOI]

Alan Jay Smith

IEEE Trans. Computers, 1999

A Framework for Matching Applications with Parallel Machines.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 1999

1998

LRU-based column-associative caches.

[BibT_eX]

[DOI]

Byung-Kwon Chung

SIGARCH Comput. Archit. News, 1998

Performance of Shared Caches on Multithreaded Architectures.

[BibT_eX]

[DOI]

Yunn Yen Chen

Chung-Ta King

J. Inf. Sci. Eng., 1998

Capturing Dynamic Memory Reference Behavior with Adaptive Cache Topology.

[BibT_eX]

[DOI]

Yongjoon Lee

Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

1997

Fast Cache Access with Full-Map Block Directory.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

Busses.

[BibT_eX]

Proceedings of the Computer Science and Engineering Handbook, 1997

1996

Performance of Shared Cache on Multithreaded Architectures.

[BibT_eX]

[DOI]

Yunn Yen Chen

Chung-Ta King

Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96), 1996

Improving Cache Performance with Balanced Tag and Data Paths.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-VII Proceedings, 1996

1993

Cache sampling by sets.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 1993

Designing High-Performance Processors Using Real Address Prediction.

[BibT_eX]

[DOI]

Kien A. Hua

IEEE Trans. Computers, 1993

Look-Ahead Routing Switches for Multistage Interconnection Networks.

[BibT_eX]

[DOI]

Yann-Hang Lee

J. Parallel Distributed Comput., 1993

Shared Translation Lookaside Buffers on Multiprocessors and a Performance Study.

[BibT_eX]

[DOI]

J. Inf. Sci. Eng., 1993

A High Performance Hybrid Architecture for Concurrent Query Execution.

[BibT_eX]

[DOI]

Kien A. Hua

Chiang Lee

J. Inf. Sci. Eng., 1993

Techniques to Enhance Cache Performance Across Parallel Program Sections.

[BibT_eX]

[DOI]

Kimming So

Ju-Ho Tang

Proceedings of the 1993 International Conference on Parallel Processing, 1993

1992

Sampling of Cache Congruence Classes.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992

1991

Interconnecting Shared-Everything Systems for Efficient Parallel Query Processing.

[BibT_eX]

[DOI]

Kien A. Hua

Chiang Lee

Proceedings of the First International Conference on Parallel and Distributed Information Systems (PDIS 1991), 1991

Inter-Section Locality of Shared Data in Parallel Programs.

[BibT_eX]

Kimming So

Ju-Ho Tang

Proceedings of the International Conference on Parallel Processing, 1991

Consecutive Requests Traffic Model in Multistage Interconnection Networks.

[BibT_eX]

Yann-Hang Lee

Sandra E. Cheung

Proceedings of the International Conference on Parallel Processing, 1991

1990

Improving multistage network performance under uniform and hot-spot traffics.

[BibT_eX]

[DOI]

Yann-Hang Lee

Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing, 1990

A Performance Evaluation Methodology for Coupled Multiple Supercomputers.

[BibT_eX]

Proceedings of the 1990 International Conference on Parallel Processing, 1990

1989

Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors.

[BibT_eX]

[DOI]

Ron Cytron

IEEE Trans. Computers, 1989

1987

Performance of an Ocean Circulation Model on LCAP-Abstract.

[BibT_eX]

Hsiao-Ming Hsu

Dale B. Haidvogel

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, 1987

1986

Program Partitioning and Synchronization on Multiprocessor Systems (Parallel, Computer Architecture, Compiler)

[BibT_eX]

[DOI]

PhD thesis, 1986

CAMP: A Programming Aide for Multiprocessors.

[BibT_eX]

Daniel Gajski

Proceedings of the International Conference on Parallel Processing, 1986

1985

Comparison of five multiprocessor systems.

[BibT_eX]

[DOI]

Daniel Gajski

Parallel Comput., 1985

Essential Issues in Multiprocessor Systems.

[BibT_eX]

[DOI]

Daniel Gajski