Santosh G. Abraham

Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

2005

Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor.

[BibT_eX]

[DOI]

Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Store Memory-Level Parallelism Optimizations for Commercial Applications.

[BibT_eX]

[DOI]

Yuan Chou

Lawrence Spracklen

Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Accurate Modeling of Aggressive Speculation in Modern Microprocessor Architectures.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Modeling, 2005

Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications.

[BibT_eX]

[DOI]

Lawrence Spracklen

Yuan Chou

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Chip Multithreading: Opportunities and Challenges.

[BibT_eX]

[DOI]

Lawrence Spracklen

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2004

Microarchitecture Optimizations for Exploiting Memory-Level Parallelism.

[BibT_eX]

[DOI]

Yuan Chou

Brian Fahs

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Effective stream-based and execution-based data prefetching.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual International Conference on Supercomputing, 2004

2002

Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots.

[BibT_eX]

[DOI]

Ivan D. Baev

Waleed Meleis

Int. J. Parallel Program., 2002

2000

Efficient design space exploration in PICO.

[BibT_eX]

[DOI]

B. Ramakrishna Rau

Proceedings of the 2000 International Conference on Compilers, 2000

High-Level Synthesis of Nonprogrammable Hardware Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on Application-Specific Systems, 2000

Efficient Backtracking Instruction Schedulers.

[BibT_eX]

[DOI]

Waleed Meleis

Ivan D. Baev

Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999

Automatic and Efficient Evaluation of Memory Hierarchies for Embedded Systems.

[BibT_eX]

[DOI]

Scott A. Mahlke

Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

1998

Meld Scheduling: A Technique for Relaxing Scheduling Constraints.

[BibT_eX]

[DOI]

Vinod Kathail

Brian L. Deitrich

Int. J. Parallel Program., 1998

1996

Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1996

Meld Scheduling: Relaxing Scheduling Constraints Across Region Boundaries.

[BibT_eX]

[DOI]

Vinod Kathail

Brian L. Deitrich

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

1995

Set-Associative Cache Simulation Using Generalized Binomial Trees

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 1995

Partitioning regular grid applications with irregular boundaries for cache-coherent multiprocessors.

[BibT_eX]

[DOI]

Yang Zeng

Proceedings of IPPS '95, 1995

Optimum Modulo Schedules for Minimum Register Requirements.

[BibT_eX]

[DOI]

Proceedings of the 9th international conference on Supercomputing, 1995

Impact of Load Imbalance on the Design of Software Barriers.

[BibT_eX]

Proceedings of the 1995 International Conference on Parallel Processing, 1995

Modeling load imbalance and fuzzy barriers for scalable shared-memory multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS-28), 1995

1994

Minimum register requirements for a modulo schedule.

[BibT_eX]

[DOI]

Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994

Data and program restructuring of irregular applications for cache-coherent multiprocessor.

[BibT_eX]

[DOI]

Karen A. Tomko

Proceedings of the 8th international conference on Supercomputing, 1994

Fast Efficient Simulation of Write-Buffer Configurations.

[BibT_eX]

[DOI]

Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

1993

Approaching a machine-application bound in delivered performance on scientific code.

[BibT_eX]

[DOI]

Tien-Pao Shih

Proc. IEEE, 1993

Utilizing Global Simulation Information in Conservative Parallel Simulation on Shared Memory Multiprocessors.

[BibT_eX]

[DOI]

Jiajen M. Lin

J. Parallel Distributed Comput., 1993

Efficient Simulation of Caches under Optimal Replacement with Applications to Miss Characterization.

[BibT_eX]

[DOI]

Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1993

Predictability of load/store instruction latencies.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

KSR 1 Multiprocessor: Analysis of Latency Hiding Techniques in a Sparse Solver.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Parallel Processing Symposium, 1993

Evaluating the Communication Performance of MPPs Using Synthetic Sparse Matrix Multiplication Workloads.

[BibT_eX]

[DOI]

Proceedings of the 7th international conference on Supercomputing, 1993

Iteration Partitioning for Resolving Stride Conflicts on Cache-Coherent Multiprocessors.

[BibT_eX]

[DOI]

Karen A. Tomko

Proceedings of the 1993 International Conference on Parallel Processing, 1993

1992

Compile-Time Optimization of Near-Neighbor Communication for Scalable Shared Memory Architecture.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1992

Fast High-Level Simulation of Shared-Memory Multiprocessor Systems.

[BibT_eX]

Jiajen M. Lin

Int. J. Comput. Simul., 1992

[BibT_eX]

[DOI]

Proceedings of the 6th international conference on Supercomputing, 1992

Discrete Event Simulation on Shared Memory Multiprocessors Using Global Simulation Information.

[BibT_eX]

Jiajen M. Lin

Proceedings of the 1992 International Conference on Parallel Processing, 1992

Computing Radiosity Solution on a High Performance Workstation LAN.

[BibT_eX]

[DOI]

Gautam B. Singh

Franklin H. Westervelt

Proceedings of the First International Symposium on High Performance Distributed Computing, 1992

1991

Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1991

A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1.

[BibT_eX]

[DOI]

Computer, 1991

Beyond loop partitioning: data assignment and overlap to reduce communication overhead.

[BibT_eX]

[DOI]

Proceedings of the 5th international conference on Supercomputing, 1991

Parallel Simulation of Fully Associative Caches.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1991

Vector Register Design for Polycyclic Vector Scheduling.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-IV Proceedings, 1991

1990

Compiler techniques for data partitioning of sequentially iterated parallel loops.

[BibT_eX]

[DOI]

Proceedings of the 4th international conference on Supercomputing, 1990

1988

Reducing Interprocessor Communication in Parallel Architectures: System Configuration and Task Assignment

[BibT_eX]

[DOI]

PhD thesis, 1988

Blocking for Parallel Sparse Linear System Solvers.

[BibT_eX]

Timothy A. Davis

Proceedings of the International Conference on Parallel Processing, 1988

1987

Parallel Garbage Collection on a Virtual Memory System.

[BibT_eX]

Janak H. Patel

Proceedings of the International Conference on Parallel Processing, 1987

1986

A Communication Model for Optimizing Hierarchical Multiprocessor Systems.

[BibT_eX]