William Jalby

Orcid: 0000-0002-4975-5469

According to our database1, William Jalby authored at least 96 papers between 1985 and 2019.

Collaborative distances:



In proceedings 
PhD thesis 


On csauthors.net:


Combining static and dynamic analysis to guide PGO for HPC applications: a case study on real-world applications.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Scalable Fast Multipole Method for Electromagnetic Simulations.
Proceedings of the Computational Science - ICCS 2019, 2019

The Long and Winding Road Toward Efficient High-Performance Computing.
Proc. IEEE, 2018

Power-Constrained Optimal Quality for High Performance Servers.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Piecewise holistic autotuning of parallel programs with CERE.
Concurr. Comput. Pract. Exp., 2017

An Incremental Methodology for Energy Measurement and Modeling.
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, 2017

MALT: a Malloc tracker.
Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems, 2017

Piecewise Holistic Autotuning of Compiler and Runtime Parameters.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

CERE: LLVM-Based Codelet Extractor and REplayer for Piecewise Benchmarking and Optimization.
ACM Trans. Archit. Code Optim., 2015

Minimizing Energy Consumption of MPI Programs in Realistic Environment.
CoRR, 2015

PCERE: Fine-Grained Parallel Benchmark Decomposition for Scalability Prediction.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Energy-centric dynamic fan control.
Comput. Sci. Res. Dev., 2014

Evaluation of CPU frequency transition latency.
Comput. Sci. Res. Dev., 2014

Computer using too much power? Give it a REST (Runtime Energy Saving Technology).
Comput. Sci. Res. Dev., 2014

Improving MPI communication overlap with collaborative polling.
Computing, 2014

Optimizing Collective Operations in Hybrid Applications.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

Using static analysis data for performance modeling and prediction.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

Task-Based Parallelization of Unstructured Meshes Assembly Using D&C Strategy.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

CQA: A code quality analyzer tool at binary level.
Proceedings of the 21st International Conference on High Performance Computing, 2014

FoREST-mn: Runtime DVFS beyond communication slack.
Proceedings of the International Green Computing Conference, 2014

Statistical Validation Methodology of CPU Power Probes.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Fine-grained Benchmark Subsetting for System Selection.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

Quantum monte carlo for large chemical systems: Implementing efficient strategies for petascale platforms and beyond.
J. Comput. Chem., 2013

Adaptive sampling for performance characterization of application kernels.
Concurr. Comput. Pract. Exp., 2013

Simsys: a performance simulation framework.
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2013

PAMDA: Performance Assessment Using MAQAO Toolset and Differential Analysis.
Proceedings of the Tools for High Performance Computing 2013, 2013

Introducing kernel-level page reuse for high performance computing.
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 2013

Divide and Conquer Parallelization of Finite Element Method Assembly.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

Binary Instrumentation for Scalable Performance Measurement of OpenMP Applications.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

Evaluating architecture and compiler design through static loop analysis.
Proceedings of the International Conference on High Performance Computing & Simulation, 2013

Quantifying performance bottleneck cost through differential analysis.
Proceedings of the International Conference on Supercomputing, 2013

Event Streaming for Online Performance Measurements Reduction.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Is Source-Code Isolation Viable for Performance Characterization?
Proceedings of the 42nd International Conference on Parallel Processing, 2013

MIL: A language to build program analysis tools through static binary instrumentation.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

Reactive DVFS Control for Multicore Processors.
Proceedings of the 2013 IEEE International Conference on Green Computing and Communications (GreenCom) and IEEE Internet of Things (iThings) and IEEE Cyber, 2013

Automatic estimation of DVFS potential.
Proceedings of the International Green Computing Conference, 2013

Topic 11: Multicore and Manycore Programming - (Introduction).
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

QMC=Chem: A Quantum Monte Carlo Program for Large-Scale Simulations in Chemistry at the Petascale Level and beyond.
Proceedings of the High Performance Computing for Computational Science, 2012

Compiler Optimizations: Machine Learning versus O3.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

Adaptive OpenMP for Large NUMA Nodes.
Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012

MicroTools: Automating Program Generation and Performance Measurement.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

ASK: Adaptive Sampling Kit for Performance Characterization.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Measuring Computer Performance.
Proceedings of the High-Performance Scientific Computing - Algorithms and Applications., 2012

Hardware Performance Monitoring for the Rest of Us: A Position and Survey.
Proceedings of the Network and Parallel Computing - 8th IFIP International Conference, 2011

Software prefetch on core micro-architecture applied to irregular codes.
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

Tackling Cache-Line Stealing Effects Using Run-Time Adaptation.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Performance Tuning of x86 OpenMP Codes with MAQAO.
Proceedings of the Tools for High Performance Computing 2009, 2009

An Approach to Application Performance Tuning.
Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

How to Accelerate an Application: a Practical Case Study in Combustion Modelling.
Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

A Balanced Approach to Application Performance Tuning.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Hybrid intelligent system for performance analysis and optimization.
Proceedings of the 2009 International Conference on High Performance Computing & Simulation, 2009

KBS-MAQAO: A Knowledge Based System for MAQAO Tool.
Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors.
Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

Fine Tuning Matrix Multiplications on Multicore.
Proceedings of the High Performance Computing, 2008

The Design and Architecture of MAQAOAdvisor: A Live Tuning Guide.
Proceedings of the High Performance Computing, 2008

Deep Jam: Conversion of Coarse-Grain Parallelism to Fine-Grain and Vector Parallelism.
J. Instr. Level Parallelism, 2007

Loop Optimization using Hierarchical Compilation and Kernel Decomposition.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

An efficient memory operations optimization technique for vector loops on Itanium 2 processors.
Concurr. Comput. Pract. Exp., 2006

Iterative Compilation with Kernel Exploration.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

Topic 4: Compilers for High Performance.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Collisions of SHA-0 and Reduced SHA-1.
Proceedings of the Advances in Cryptology, 2005

Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing.
Int. J. High Perform. Comput. Appl., 2004

Branch Strategies to Optimize Decision Trees for Wide-Issue Architectures.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Improving Load/Store Queues Usage in Scientific Computing.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

Hardware Prediction for Data Coherency of Scientific Codes on DSM.
Proceedings of the Proceedings Supercomputing 2000, 2000

Experimental Analysis of Coherency Behavior of Shared Memory Scientific Applications.
Proceedings of the MASCOTS 2000, Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 29 August, 2000

Coherency Behavior on DSM: A Case Study (Research Note).
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

OCEANS - Optimising Compilers for Embedded Applications.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999



Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 1996

Influence of Cross-Interferences on Blocked Loops: A Case Study with Matric-Vector Multiply
ACM Trans. Program. Lang. Syst., 1995

A strategy for array management in local memory.
Math. Program., 1994

Cache Interference Phenomena.
Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1994

Impact of cache interferences on usual numerical dense loop nests.
Proc. IEEE, 1993

To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts.
Proceedings of the Proceedings Supercomputing '93, 1993

Evaluating the Impact of Cache Interferences on Numerical Codes.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

Characterizing the Behavior of Sparse Algorithms on Caches.
Proceedings of the Proceedings Supercomputing '92, 1992

Stability Analysis and Improvement of the Block Gram-Schmidt Algorithm.
SIAM J. Sci. Comput., 1991

Performance Prediction for Parallel Numerical Algorithms.
Int. J. High Speed Comput., 1991

Behavioral characterization of decoupled access/execute architecture.
Proceedings of the 5th international conference on Supercomputing, 1991

Preliminary Performance Analysis of the Cedar Multiprocessor Memory System.
Proceedings of the International Conference on Parallel Processing, 1991

A Quantitative Algorithm for Data Locality Optimization.
Proceedings of the Code Generation, 1991

Experimentally Characterizing the Behavior of Multiprocessor Memory Systems. A Case Study.
IEEE Trans. Software Eng., 1990

Compiler Techniques for Optimizing Memory and Register Usage on the Cray 2.
Int. J. High Speed Comput., 1990

Performance evaluation and prediction for parallel algorithms on the BBN GP1000.
Proceedings of the 4th international conference on Supercomputing, 1990

Behavioral Characterization of Multiprocessor Memory Systems: A Case Study.
Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 1989

Performance prediction of loop constructs on multiprocessor hierarchical-memory systems.
Proceedings of the 3rd international conference on Supercomputing, 1989

Strategies for Cache and Local Memory Management by Global Program Transformation.
J. Parallel Distributed Comput., 1988

Squeezing more CPU performance out of a Cray-2 by Vector block scheduling.
Proceedings of the Proceedings Supercomputing '88, Orlando, FL, USA, November 12-17, 1988, 1988

On the problem of optimizing data transfers for complex memory systems.
Proceedings of the 2nd international conference on Supercomputing, 1988

Optimizing Matrix Operations on a Parallel Multiprocessor with a Memory Hierarchical System.
Proceedings of the International Conference on Parallel Processing, 1986

Parallel Algorithms on the CEDAR System.
Proceedings of the CONPAR 86: Conference on Algorithms and Hardware for Parallel Processing, 1986

XOR-Schemes: A Flexible Data Organization in Parallel Memories.
Proceedings of the International Conference on Parallel Processing, 1985
