William Jalby

Proceedings of the International Green Computing Conference, 2013

Topic 11: Multicore and Manycore Programming - (Introduction).

[BibT_eX]

[DOI]

Luiz De Rose

Jan Treibig

Alba Cristina Magalhaes Alves de Melo

David Abramson

Alastair F. Donaldson

Tomàs Margalef

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012

QMC=Chem: A Quantum Monte Carlo Program for Large-Scale Simulations in Chemistry at the Petascale Level and beyond.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science, 2012

Improving MPI Communication Overlap with Collaborative Polling.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Compiler Optimizations: Machine Learning versus O3.

[BibT_eX]

[DOI]

Yuriy Kashnikov

Proceedings of the Languages and Compilers for Parallel Computing, 2012

Adaptive OpenMP for Large NUMA Nodes.

[BibT_eX]

[DOI]

Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012

MicroTools: Automating Program Generation and Performance Measurement.

[BibT_eX]

[DOI]

Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

ASK: Adaptive Sampling Kit for Performance Characterization.

[BibT_eX]

[DOI]

Pablo de Oliveira Castro

Eric Petit

Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Measuring Computer Performance.

[BibT_eX]

[DOI]

David C. Wong

David J. Kuck

Proceedings of the High-Performance Scientific Computing - Algorithms and Applications., 2012

2011

Hardware Performance Monitoring for the Rest of Us: A Position and Survey.

[BibT_eX]

[DOI]

Tipp Moseley

Neil Vachharajani

Proceedings of the Network and Parallel Computing - 8th IFIP International Conference, 2011

Software prefetch on core micro-architecture applied to irregular codes.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

2010

Tackling Cache-Line Stealing Effects Using Run-Time Adaptation.

[BibT_eX]

[DOI]

Stéphane Zuckerman

Proceedings of the Languages and Compilers for Parallel Computing, 2010

2009

Performance Tuning of x86 OpenMP Codes with MAQAO.

[BibT_eX]

[DOI]

Proceedings of the Tools for High Performance Computing 2009, 2009

An Approach to Application Performance Tuning.

[BibT_eX]

[DOI]

Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

How to Accelerate an Application: a Practical Case Study in Combustion Modelling.

[BibT_eX]

[DOI]

Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

A Balanced Approach to Application Performance Tuning.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2009

Hybrid intelligent system for performance analysis and optimization.

[BibT_eX]

[DOI]

Lamia Djoudi

Vasil Khachidze

Proceedings of the 2009 International Conference on High Performance Computing & Simulation, 2009

KBS-MAQAO: A Knowledge Based System for MAQAO Tool.

[BibT_eX]

[DOI]

Lamia Djoudi

Vasil Khachidze

Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors.

[BibT_eX]

[DOI]

Samir Ammenouche

Sid Ahmed Ali Touati

Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

2008

Fine Tuning Matrix Multiplications on Multicore.

[BibT_eX]

[DOI]

Stéphane Zuckerman

Marc Pérache

Proceedings of the High Performance Computing, 2008

The Design and Architecture of MAQAOAdvisor: A Live Tuning Guide.

[BibT_eX]

[DOI]

Lamia Djoudi

Jose Noudohouenou

Proceedings of the High Performance Computing, 2008

2007

Deep Jam: Conversion of Coarse-Grain Parallelism to Fine-Grain and Vector Parallelism.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2007

Loop Optimization using Hierarchical Compilation and Kernel Decomposition.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

2006

An efficient memory operations optimization technique for vector loops on Itanium 2 processors.

[BibT_eX]

[DOI]

Sid Ahmed Ali Touati

Concurr. Comput. Pract. Exp., 2006

Iterative Compilation with Kernel Exploration.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2006

Topic 4: Compilers for High Performance.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

2005

Collisions of SHA-0 and Reduced SHA-1.

[BibT_eX]

[DOI]

Proceedings of the Advances in Cryptology, 2005

Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications.

[BibT_eX]

[DOI]

Patrick Carribault

Albert Cohen

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2004

WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing.

[BibT_eX]

[DOI]

X. Le Pasteur

Int. J. High Perform. Comput. Appl., 2004

Branch Strategies to Optimize Decision Trees for Wide-Issue Architectures.

[BibT_eX]

[DOI]

Patrick Carribault

Albert Cohen

Proceedings of the Languages and Compilers for High Performance Computing, 2004

Improving Load/Store Queues Usage in Scientific Computing.

[BibT_eX]

[DOI]

Sid Ahmed Ali Touati

Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

2000

Hardware Prediction for Data Coherency of Scientific Codes on DSM.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing 2000, 2000

Experimental Analysis of Coherency Behavior of Shared Memory Scientific Applications.

[BibT_eX]

[DOI]

Proceedings of the MASCOTS 2000, Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 29 August, 2000

Coherency Behavior on DSM: A Case Study (Research Note).

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999

OCEANS - Optimising Compilers for Embedded Applications.

[BibT_eX]

[DOI]

Peter M. W. Knijnenburg

Paul van der Mark

Andy Nisbet

Michael F. P. O'Boyle

Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

1998

OCEANS: Optimising Compilers for Embedded Applications.

[BibT_eX]

[DOI]

Peter M. W. Knijnenburg

Michael F. P. O'Boyle

Proceedings of the Euro-Par '98 Parallel Processing, 1998

1997

OCEANS: Optimizing Compilers for Embedded Applications.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '97 Parallel Processing, 1997

1995

Influence of Cross-Interferences on Blocked Loops: A Case Study with Matric-Vector Multiply

[BibT_eX]

[DOI]

ACM Trans. Program. Lang. Syst., 1995

1994

A strategy for array management in local memory.

[BibT_eX]

[DOI]

Math. Program., 1994

Cache Interference Phenomena.

[BibT_eX]

[DOI]

Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1994

1993

Impact of cache interferences on usual numerical dense loop nests.

[BibT_eX]

[DOI]

Oliver Temam

Proc. IEEE, 1993

To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts.

[BibT_eX]

[DOI]

Elana D. Granston

Proceedings of the Proceedings Supercomputing '93, 1993

The Cedar System and an Initial Performance Study.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

Evaluating the Impact of Cache Interferences on Numerical Codes.

[BibT_eX]

[DOI]

Proceedings of the 1993 International Conference on Parallel Processing, 1993

Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 1993 International Conference on Parallel Processing, 1993

1992

Characterizing the Behavior of Sparse Algorithms on Caches.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '92, 1992

1991

Stability Analysis and Improvement of the Block Gram-Schmidt Algorithm.

[BibT_eX]

[DOI]

Bernard Philippe

SIAM J. Sci. Comput., 1991

Performance Prediction for Parallel Numerical Algorithms.

[BibT_eX]

[DOI]

Int. J. High Speed Comput., 1991

Behavioral characterization of decoupled access/execute architecture.

[BibT_eX]

[DOI]

Daniel Windheiser

Proceedings of the 5th international conference on Supercomputing, 1991

Preliminary Performance Analysis of the Cedar Multiprocessor Memory System.

[BibT_eX]

Kyle A. Gallivan

Stephen W. Turner

Alexander V. Veidenbaum

Harry A. G. Wijshoff

Proceedings of the International Conference on Parallel Processing, 1991

A Quantitative Algorithm for Data Locality Optimization.

[BibT_eX]

[DOI]

Proceedings of the Code Generation, 1991

1990

Experimentally Characterizing the Behavior of Multiprocessor Memory Systems. A Case Study.

[BibT_eX]

[DOI]

IEEE Trans. Software Eng., 1990

Compiler Techniques for Optimizing Memory and Register Usage on the Cray 2.

[BibT_eX]

[DOI]

Christine Eisenbeis

Alain Lichnewsky

Int. J. High Speed Comput., 1990

Performance evaluation and prediction for parallel algorithms on the BBN GP1000.

[BibT_eX]

[DOI]

Proceedings of the 4th international conference on Supercomputing, 1990

1989

Behavioral Characterization of Multiprocessor Memory Systems: A Case Study.

[BibT_eX]

[DOI]

Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 1989

Performance prediction of loop constructs on multiprocessor hierarchical-memory systems.

[BibT_eX]

[DOI]

Proceedings of the 3rd international conference on Supercomputing, 1989

1988

Squeezing more CPU performance out of a Cray-2 by Vector block scheduling.

[BibT_eX]

[DOI]

Christine Eisenbeis

Alain Lichnewsky

Proceedings of the Proceedings Supercomputing '88, Orlando, FL, USA, November 12-17, 1988, 1988

On the problem of optimizing data transfers for complex memory systems.

[BibT_eX]

[DOI]

Kyle A. Gallivan

Dennis Gannon

Proceedings of the 2nd international conference on Supercomputing, 1988

1987

Strategies for Cache and Local Memory Management by Global Program Transformation.

[BibT_eX]

[DOI]

Dennis Gannon

Kyle A. Gallivan

Proceedings of the Supercomputing, 1987

1986

Optimizing Matrix Operations on a Parallel Multiprocessor with a Memory Hierarchical System.

[BibT_eX]

Ulrike Meier

Proceedings of the International Conference on Parallel Processing, 1986

Parallel Algorithms on the CEDAR System.

[BibT_eX]

[DOI]

Proceedings of the CONPAR 86: Conference on Algorithms and Hardware for Parallel Processing, 1986

1985

XOR-Schemes: A Flexible Data Organization in Parallel Memories.

[BibT_eX]

Jean Marc Frailong