Pen-Chung Yew

Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

EVeREST-C: An Effective and Versatile Runtime Energy Saving Tool for CPUs.

[BibT_eX]

[DOI]

Anna Yue

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

DeCOS: Data-Efficient Reinforcement Learning for Compiler Optimization Selection Ignited by LLM.

[BibT_eX]

[DOI]

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

GPU Stream-Aware Communication for Effective Pipelining.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Parallel Architectures and Compilation Techniques, 2025

2024

JiuJITsu: Removing Gadgets with Safe Register Allocation for JIT Code Generation.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., March, 2024

A System-Level Dynamic Binary Translator Using Automatically-Learned Translation Rules.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

Non-Fusion Based Coherent Cache Randomization Using Cross-Domain Accesses.

[BibT_eX]

[DOI]

Proceedings of the 19th ACM Asia Conference on Computer and Communications Security, 2024

2023

SpecWands: An Efficient Priority-Based Scheduler Against Speculation Contention Attacks.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023

Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., June, 2023

SpecBox: A Label-Based Transparent Speculation Scheme Against Transient Execution Attacks.

[BibT_eX]

[DOI]

IEEE Trans. Dependable Secur. Comput., 2023

ReAPER: Region Aware Power and Energy Regulator.

[BibT_eX]

[DOI]

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2022

Making Information Hiding Effective Again.

[BibT_eX]

[DOI]

IEEE Trans. Dependable Secur. Comput., 2022

PREDATOR: A Cache Side-Channel Attack Detector Based on Precise Event Monitoring.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022

2021

Ascetic: Enhancing Cross-Iterations Data Efficiency in Out-of-Memory Graph Processing on GPUs.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Enhancing Atomic Instruction Emulation for Cross-ISA Dynamic Binary Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

Variable-Sized Blocks for Locality-Aware SpMV.

[BibT_eX]

[DOI]

Naveen Namashivavam

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

Regaining Lost Seconds: Efficient Page Preloading for SGX Enclaves.

[BibT_eX]

[DOI]

Proceedings of the Middleware '20: 21st International Middleware Conference, 2020

More with Less - Deriving More Translation Rules with Less Training Data for DBTs Using Parameterization.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

First Time Miss : Low Overhead Mitigation for Shared Memory Cache Side Channels.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Efficient and scalable cross-ISA virtualization of hardware transactional memory.

[BibT_eX]

[DOI]

Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019

New Attacks and Defenses for Randomized Caches.

[BibT_eX]

[DOI]

CoRR, 2019

A formally verified transformation to unify multiple nested clocks for a Lustre-like language.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2019

SafeHidden: An Efficient and Secure Information Hiding Technique Using Re-randomization.

[BibT_eX]

[DOI]

Proceedings of the 28th USENIX Security Symposium, 2019

Unleashing the Power of Learning: An Enhanced Learning-Based Approach for Dynamic Binary Translation.

[BibT_eX]

[DOI]

Proceedings of the 2019 USENIX Annual Technical Conference, 2019

2018

Using Local Clocks to Reproduce Concurrency Bugs.

[BibT_eX]

[DOI]

IEEE Trans. Software Eng., 2018

RARE: An Efficient Static Fault Detection Framework for Definition-Use Faults in Large Programs.

[BibT_eX]

[DOI]

IEEE Access, 2018

Improving Dynamically-Generated Code Performance on Dynamic Binary Translators.

[BibT_eX]

[DOI]

Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2018

Check It Again: Detecting Lacking-Recheck Bugs in OS Kernels.

[BibT_eX]

[DOI]

Wenwen Wang

Kangjie Lu

Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018

Enhancing Cross-ISA DBT Through Automatically Learned Translation Rules.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017

VarCatcher: A Framework for Tackling Performance Variability of Parallel Workloads on Multi-Core.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Prophet: A Parallel Instruction-Oriented Many-Core Simulator.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Enabling Cross-ISA Offloading for COTS Binaries.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual International Conference on Mobile Systems, 2017

A formally verified sequentializer for lustre-like concurrent synchronous data-flow programs.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Software Engineering, 2017

2016

Variable Liberalization.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

A General Persistent Code Caching Framework for Dynamic Binary Translation (DBT).

[BibT_eX]

[DOI]

Proceedings of the 2016 USENIX Annual Technical Conference, 2016

TurboTiling: Leveraging Prefetching to Boost Performance of Tiled Codes.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Supercomputing, 2016

2015

FPS: A Fair-Progress Process Scheduling Policy on Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2015

WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers.

[BibT_eX]

[DOI]

J. Supercomput., 2015

Performance-Energy Considerations for Shared Cache Management in a Heterogeneous Multicore Processor.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2015

Adaptive granularity and coordinated management for timely prefetching in multi-core systems.

[BibT_eX]

[DOI]

Proceedings of the VLSI Design, Automation and Test, 2015

Improving compiler scalability: optimizing large programs at small price.

[BibT_eX]

[DOI]

Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

ReCBuLC: Reproducing Concurrency Bugs Using Local Clocks.

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, 2015

2014

Efficient and Retargetable Dynamic Binary Translation on Multicores.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2014

Measuring Microarchitectural Details of Multi- and Many-Core Memory Systems through Microbenchmarking.

[BibT_eX]

[DOI]

James B. S. G. Greensky

Gautham Beeraka

Binyu Zang

ACM Trans. Archit. Code Optim., 2014

Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2014

DBILL: an efficient and retargetable dynamic binary instrumentation framework using llvm backend.

[BibT_eX]

[DOI]

Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2014

Efficient memory virtualization for Cross-ISA system mode emulation.

[BibT_eX]

[DOI]

Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2014

Concurrency bug localization using shared memory access pairs.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Revisiting loop fusion in the polyhedral framework.

[BibT_eX]

[DOI]

Pei-Hung Lin

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Localization of concurrency bugs using shared memory access pairs.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE International Conference on Automated Software Engineering, 2014

CFD Builder: A Library Builder for Computational Fluid Dynamics.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Multi-stage coordinated prefetching for present-day processors.

[BibT_eX]

[DOI]

Proceedings of the 2014 International Conference on Supercomputing, 2014

DAPs: Dynamic Adjustment and Partial Sampling for Multithreaded/Multicore Simulation.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013

SEED: A Statically Greedy and Dynamically Adaptive Approach for Speculative Loop Execution.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2013

Tile size selection revisited.

[BibT_eX]

[DOI]

Gautham Beeraka

ACM Trans. Archit. Code Optim., 2013

Cross-layer dynamic prefetching allocation strategies for high-performance multicores.

[BibT_eX]

[DOI]

Proceedings of the 2013 International Symposium on VLSI Design, Automation, and Test, 2013

Improving dynamic binary optimization through early-exit guided code region formation.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (co-located with ASPLOS 2013), 2013

Selective Profiling for OS Scalability Study on Multicore Systems.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE 6th International Conference on Service-Oriented Computing and Applications, 2013

A systematic methodology for OS benchmark characterization.

[BibT_eX]

[DOI]

Proceedings of the Research in Adaptive and Convergent Systems, 2013

Synchronization Identification through On-the-Fly Test.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Managing shared last-level cache in a heterogeneous multicore processor.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

On-the-fly structure splitting for heap objects.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2012

A Study of Performance Portability Using Piecewise-Parabolic Method (PPM) Gas Dynamics Applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2012

Code Transformations for Enhancing the Performance of speculatively Parallel Threads.

[BibT_eX]

[DOI]

Shengyue Wang

Antonia Zhai

J. Circuits Syst. Comput., 2012

Providing fairness on shared-memory multiprocessors via process scheduling.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012

HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

2011

Cedar Multiprocessor.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

Dynamic Software Updating Using a Relaxed Consistency Model.

[BibT_eX]

[DOI]

IEEE Trans. Software Eng., 2011

ASLOP: A field-access affinity-based structure data layout optimizer.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2011

LnQ: Building High Performance Dynamic Binary Translators with Existing Compiler Backends.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

SPAS: Scalable Path-Sensitive Pointer Analysis on Full-Sparse SSA.

[BibT_eX]

[DOI]

Proceedings of the Programming Languages and Systems - 9th Asian Symposium, 2011

2010

Boosting the performance of computational fluid dynamics codes for interactive supercomputing.

[BibT_eX]

[DOI]

James B. S. G. Greensky

Anthony Nowatski

Karl Stoffels

Proceedings of the International Conference on Computational Science, 2010

On improving heap memory layout by dynamic pool allocation.

[BibT_eX]

[DOI]

Zhenjiang Wang

Chenggang Wu

Proceedings of the CGO 2010, 2010

An adaptive task creation strategy for work-stealing scheduling.

[BibT_eX]

[DOI]

Proceedings of the CGO 2010, 2010

On mitigating memory bandwidth contention through bandwidth-aware scheduling.

[BibT_eX]

[DOI]

Di Xu

Chenggang Wu

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

Control flow obfuscation with information flow tracking.

[BibT_eX]

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Exploring speculative parallelism in SPEC2006.

[BibT_eX]

[DOI]

Venkatesan Packirisamy

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs.

[BibT_eX]

[DOI]

Proceedings of the CGO 2009, 2009

2008

Moving Scientific Codes to Multicore Microprocessor CPUs.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2008

Compiler optimizations for parallelizing general-purpose applications under thread-level speculation.

[BibT_eX]

[DOI]

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

From Speculation to Security: Practical and Efficient Information Flow Tracking Using Speculative Hardware.

[BibT_eX]

[DOI]

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Efficiency of thread-level speculation in SMT and CMP architectures - performance, power and thermal perspective.

[BibT_eX]

[DOI]

Venkatesan Packirisamy

Proceedings of the 26th International Conference on Computer Design, 2008

2007

CIM: A Reliable Metric for Evaluating Program Phase Classifications.

[BibT_eX]

[DOI]

Sreekumar V. Kodakara

IEEE Comput. Archit. Lett., 2007

Analysis of Statistical Sampling in Microarchitecture Simulation: Metric, Methodology and Program Characterization.

[BibT_eX]

[DOI]

Sreekumar V. Kodakara

Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

POLUS: A POwerful Live Updating System.

[BibT_eX]

[DOI]

Proceedings of the 29th International Conference on Software Engineering (ICSE 2007), 2007

COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications.

[BibT_eX]

[DOI]

Jinpyo Kim

Wei-Chung Hsu

Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Mercury: Combining Performance with Dependability Using Self-virtualization.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

A Compiler Framework for Supporting Speculative Multicore Processors.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Systems Architecture, 2007

Runtime Performance Projection Model for Dynamic Power Management.

[BibT_eX]

[DOI]

Hae-Kag Lee

Proceedings of the Advances in Computer Systems Architecture, 2007

Entropy-Based Profile Characterization and Classification for Automatic Profile Management.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Systems Architecture, 2007

2006

Editorial: EIC Farewell and New EIC Introduction.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2006

Recovery code generation for general speculative optimizations.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2006

Live updating operating systems using virtualization.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Conference on Virtual Execution Environments, 2006

Exploiting Speculative Thread-Level Parallelism in Data Compression Applications.

[BibT_eX]

[DOI]

Shengyue Wang

Antonia Zhai

Proceedings of the Languages and Compilers for Parallel Computing, 2006

Supporting Speculative Multithreading on Simultaneous Multithreaded Processors.

[BibT_eX]

[DOI]

Venkatesan Packirisamy

Proceedings of the High Performance Computing, 2006

2005

Forword.

[BibT_eX]

[DOI]

Jingling Xue

J. Comput. Sci. Technol., 2005

Minimizing the Directory Size for Large-Scale Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

Jinseok Kong

IEICE Trans. Inf. Syst., 2005

Loop Selection for Thread-Level Speculation.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2005

Using Speculative Multithreading for General-Purpose Applications.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Processing and Applications, 2005

Dynamic Code Region (DCR) Based Program Phase Tracking and Prediction for Dynamic Optimizations.

[BibT_eX]

[DOI]

Jinpyo Kim

Sreekumar V. Kodakara

Wei-Chung Hsu

Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Performance of Runtime Optimization on BLAST.

[BibT_eX]

[DOI]

Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion.

[BibT_eX]

[DOI]

Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

2004

Editor's Note.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2004

A compiler framework for speculative optimizations.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2004

Design and Implementation of a Lightweight Dynamic Optimization System.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2004

Data Dependence Profiling for Speculative Optimizations.

[BibT_eX]

[DOI]

Proceedings of the Compiler Construction, 13th International Conference, 2004

Continuous Adaptive Object-Code Re-optimization Framework.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Systems Architecture, 9th Asia-Pacific Conference, 2004

A Compiler Framework for Recovery Code Generation in General Speculative Optimizations.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003

A compiler framework for speculative analysis and optimizations.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Is There Exploitable Thread-Level Parallelism in General-Purpose Application Programs?

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Speculative Register Promotion Using Advanced Load Address Table (ALAT).

[BibT_eX]

[DOI]

Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002

Editorial.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2002

On Augmenting Trace Cache for High-Bandwidth Value Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2002

An Empirical Study on the Granularity of Pointer Analysis in C Programs.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

Interprocedural Induction Variable Analysis.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Parallel Architectures, 2002

On the Impact of Naming Methods for Heap-Oriented Pointers in C Programs.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Parallel Architectures, 2002

On the Predictability of Program Behavior Using Different Input Data Sets.

[BibT_eX]

[DOI]

Proceedings of the 6th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-6 2002), 2002

2001

On Table Bandwidth and Its Update Delay for Value Prediction on Wide-Issue ILP Processors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2001

A High-Bandwidth Memory Pipeline for Wide Issue Processors.

[BibT_eX]

[DOI]

Sangyeun Cho

IEEE Trans. Computers, 2001

2000

Compiler Analysis for Cache Coherence: Interprocedural Array Data-Flow Analysis and Its Impact on Cache Performance.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2000

Hardware and Compiler-Directed Cache Coherence in Large-Scale Multiprocessors: Design Considerations and Performance Study.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2000

JaViz: A client/server Java profiling tool.

[BibT_eX]

[DOI]

IBM Syst. J., 2000

Efficient Integration of Compiler-Directed Cache Coherence and Data Prefetching.

[BibT_eX]

[DOI]

Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Decoupled Value Prediction on Trace Processors.

[BibT_eX]

[DOI]

Yuan Wang

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

On Some Implementation Issues for Value Prediction on Wide-Issue ILP Processors.

[BibT_eX]

[DOI]

Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999

The Superthreaded Processor Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1999

Enhancing multiple-path speculative execution with predicate window shifting.

[BibT_eX]

[DOI]

J. Syst. Archit., 1999

Compiler Techniques for the Superthreaded Architectures.

[BibT_eX]

[DOI]

Zhenzhen Jiang

Int. J. Parallel Program., 1999

Access Region Locality for High-Bandwidth Processor Memory System Design.

[BibT_eX]

[DOI]

Sangyeun Cho

Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

Designing the Agassiz Compiler for Concurrent Multithreaded Architectures.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1999

Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor.

[BibT_eX]

[DOI]

Sangyeun Cho

Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

1998

Maintaining Cache Coherence through Compiler-Directed Data Prefetching.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1998

Integrating Parallelizing Compilation Technology and Processor Architecture for Cost-Effective Concurrent multithreading.

[BibT_eX]

[DOI]

J. Inf. Sci. Eng., 1998

Introduction.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1998

An Integrated Framework for Compiler-Directed Cache Coherence and Data Prefetching.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1998

Retrospective: The Cedar System.

[BibT_eX]

[DOI]

Alexander V. Veidenbaum

Constantine D. Polychronopoulos

David J. Kuck

David A. Padua

Edward S. Davidson

Kyle A. Gallivan

Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

High-Level Information - An Approach for Integrating Front-End and Back-End Compilers.

[BibT_eX]

[DOI]

Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Performance Study of a Concurrent Multithreaded Processor.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

1997

Performance Evaluation of Wire-Limited Hierarchical Networks.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1997

Changing Interaction of Compiler and Architecture.

[BibT_eX]

[DOI]

Computer, 1997

Program Optimization for Concurrent Multithreaded Architectures.

[BibT_eX]

[DOI]

Zhenzhen Jiang

Proceedings of the Languages and Compilers for Parallel Computing, 1997

A Compiler-Directed Cache Coherence Scheme Using Data Prefetching.

[BibT_eX]

[DOI]

Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

1996

On Effective Execution of Nonuniform DOACROSS Loops.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1996

Integrating Fine-Grained Message Passing in Cache Coherent Shared Memory Multiprocessors.

[BibT_eX]

[DOI]

David K. Poulsen

J. Parallel Distributed Comput., 1996

Chief: A Simulation Environment for Studying Parallel Systems.

[BibT_eX]

Int. J. Comput. Simul., 1996

Techniques for Compiler-Directed Cache Coherence.

[BibT_eX]

[DOI]

IEEE Parallel Distributed Technol. Syst. Appl., 1996

Compiler Support for Maintaining Cache Coherence Using Data Prefetching (Extended Abstract).

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1996

Compiler Techniques for Concurrent Multithreading with Hardware Speculation Support.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1996

Compiler and Hardware Support for Cache Coherence in Large-Scale Multiprocessors: Design Considerations and Performance Study.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Eliminating Stale Data References through Array Data-Flow Analysis.

[BibT_eX]

[DOI]

Proceedings of IPPS '96, 1996

Let Us Build System-Friendly Networks - Build Them Hierarchically.

[BibT_eX]

[DOI]

Proceedings of the 1996 International Conference on Parallel Processing Workshop, 1996

Program Analysis for Cache Coherence: Beyond Procedural Boundaries.

[BibT_eX]

[DOI]

Proceedings of the 1996 International Conference on Parallel Processing, 1996

The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995

Special Issues on Distributed Shared Memory Systems: Guest Editor's Introduction.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1995

Processor Self-Scheduling in Parallel Discrete Event Simulation.

[BibT_eX]

[DOI]

Proceedings of the 27th conference on Winter simulation, 1995

Partitioning for synchronous parallel simulation.

[BibT_eX]

[DOI]

Proceedings of the Ninth Workshop on Parallel and Distributed Simulation, 1995

Interprocedural Array Data Flow Analysis for Cache Coherence.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1995

1994

A compiler-directed cache coherence scheme with improved intertask locality.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '94, 1994

An efficient algorithm for the run-time parallelization of DOACROSS loops.

[BibT_eX]

[DOI]

Josep Torrellas

Proceedings of the Proceedings Supercomputing '94, 1994

Improved parallel architectural simulations on shared-memory multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Eighth Workshop on Parallel and Distributed Simulation, 1994

Redundant Synchronization Elimination for DOACROSS Loops.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Parallel Processing, 1994

Data Prefetching and Data Forwarding in Shared Memory Multiprocessors.

[BibT_eX]

[DOI]

David K. Poulsen

Proceedings of the 1994 International Conference on Parallel Processing, 1994

Statement Re-ordering for DOACROSS Loops.

[BibT_eX]

[DOI]

Proceedings of the 1994 International Conference on Parallel Processing, 1994

1993

Improving Memory Utilization in Cache Coherence Directories.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1993

Execution-driven tools for parallel simulation of parallel architectures and applications.

[BibT_eX]

[DOI]

David K. Poulsen

Proceedings of the Proceedings Supercomputing '93, 1993

The Cedar System and an Initial Performance Study.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

1992

An Effective Synchronization Network for Hot-Spot Accesses.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 1992

The Impact of Wiring Constraints on Hierarchical Network Performance.

[BibT_eX]

[DOI]

Proceedings of the 6th International Parallel Processing Symposium, 1992

A Scheme for Effective Execution of Irregular Doacross Loops.

[BibT_eX]

Proceedings of the 1992 International Conference on Parallel Processing, 1992

1991

Guest Editor's Introduction.

[BibT_eX]

[DOI]

David A. Padua

Benjamin W. Wah

IEEE Trans. Parallel Distributed Syst., 1991

Special Issue on Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

Benjamin W. Wah

J. Parallel Distributed Comput., 1991

Efficient Doacross execution on distributed shared-memory multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '91, 1991

An Effective Synchronization Network for Large Multiprocessor Systems.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Parallel Processing Symposium, Proceedings, Anaheim, California, USA, April 30, 1991

Chief: A Parallel Simulation Environment for Parallel Systems.

[BibT_eX]

[DOI]

John D. Bruner

Hoichi Cheong

Alexander V. Veidenbaum

Proceedings of the Fifth International Parallel Processing Symposium, Proceedings, Anaheim, California, USA, April 30, 1991

Parallel program behavioral study on a shared-memory multiprocessor.

[BibT_eX]

[DOI]

Proceedings of the 5th international conference on Supercomputing, 1991

Combining hardware and software cache coherence strategies.

[BibT_eX]

[DOI]

Proceedings of the 5th international conference on Supercomputing, 1991

Efficient Interprocessor Communication on Distributed Shared-Memory Multiprocessors.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1991

The Organization of the Cedar System.

[BibT_eX]

Jeff Konicek

Tracy Tilton

Alexander V. Veidenbaum

Proceedings of the International Conference on Parallel Processing, 1991

The Performance of Hierarchical Systems with Wiring Constraints.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1991

Parallel discrete event simulation on shared-memory multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 24th Annual Simulation Symposium (ANSS-24 1991), 1991

1990

An Empirical Study of Fortran Programs for Parallelizing Compilers.

[BibT_eX]

[DOI]

Zhiyu Shen

IEEE Trans. Parallel Distributed Syst., 1990

An Efficient Data Dependence Analysis for Parallelizing Compilers.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1990

Software Combining Algorithms for Distributing Hot-Spot Addressing.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1990

The Impact of Synchronization and Granularity on Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990

Compiler techniques for data synchronization in nested parallel loops.

[BibT_eX]

[DOI]

Proceedings of the 4th international conference on Supercomputing, 1990

Comparing Parallelism Extraction Techniques: Superscalar Processors, Pipelined Processors, and Multiprocessors.

[BibT_eX]

Proceedings of the 1990 International Conference on Parallel Processing, 1990

1989

On Data Synchronization for Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989

Data dependence analysis on multi-dimensional array references.

[BibT_eX]

[DOI]

Proceedings of the 3rd international conference on Supercomputing, 1989

An Empirical Study on Array Subscripts and Data Dependencies.

[BibT_eX]

Zhiyu Shen

Proceedings of the International Conference on Parallel Processing, 1989

A parallel linked list for shared-memory multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual International Computer Software and Applications Conference, 1989

1988

Program parallelization with interprocedural analysis.

[BibT_eX]

[DOI]

J. Supercomput., 1988

Realizing Fault-Tolerant Interconnection Networks via Chaining.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1988

Efficient Interprocedural Analysis for Program Parallelization and Restructuring.

[BibT_eX]

[DOI]

Proceedings of the ACM/SIGPLAN PPEALS 1988, 1988

Impact of self-scheduling order on performance on multiprocessor systems.

[BibT_eX]

[DOI]

Proceedings of the 2nd international conference on Supercomputing, 1988

Interprocedural Analysis for Parallel Programs.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1988

1987

A Scheme to Enforce Data Dependence on Large Multiprocessor Systems.

[BibT_eX]

[DOI]

IEEE Trans. Software Eng., 1987

Distributing Hot-Spot Addressing in Large-Scale Multiprocessors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1987

Multiprocessor Cache Design Considerations.

[BibT_eX]

[DOI]

Roland L. Lee

Proceedings of the 14th Annual International Symposium on Computer Architecture. Pittsburgh, 1987

: Data Prefetching In Shared Memory Multiprocessors.

[BibT_eX]

Roland L. Lee

Proceedings of the International Conference on Parallel Processing, 1987

An Enhancement Scheme for Hypercube Interconnection Networks.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1987

Deadlock Prevention in Processor Self-Scheduling for Parallel Nested Loops.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1987

Dynamic Processor Self-Scheduling for General Parallel Nested Loops.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1987

1986

Distributing Hot-Spot Addressing in Large Scale Multiprocessor.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1986

Processor Self-Scheduling for Multiple-Nested Parallel Loops.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1986

1985

Fault-Tolerant Scheme for Multistage Interconnection Networks.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Symposium on Computer Architecture, 1985

The Performance of a Fault-Tolerant Multistage Interconnection Network.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1985

1984

A Synchronization Scheme and Its Applications for Large Multiprocessor Systems.

[BibT_eX]

Proceedings of the 4th International Conference on Distributed Computing Systems, 1984

1982

A fault tolerant interconnection network using error correcting codes.

[BibT_eX]

J. Edward Lilienkamp

Proceedings of the International Conference on Parallel Processing, 1982

Performance of packet switching in buffered single-stage shuffle-exchange networks.

[BibT_eX]

Pin-Yee Chen

Proceedings of the Proceedings of the 3rd International Conference on Distributed Computing Systems, 1982

1981

On the Design of Interconnection Networks for Parallel and Multiprocessor Systems

[BibT_eX]

[DOI]

PhD thesis, 1981

An Easily Controlled Network for Frequently Used Permutation.

[BibT_eX]

[DOI]