Julien Langou

Orcid: 0000-0002-7803-1822

Affiliations:
  • University of Colorado Denver


According to our database1, Julien Langou authored at least 68 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A new deflation criterion for the QZ algorithm.
Numer. Linear Algebra Appl., January, 2024

2022
Low-synch Gram-Schmidt with delayed reorthogonalization for Krylov solvers.
Parallel Comput., 2022

Numerical analysis of Givens rotation.
CoRR, 2022

I/O-Optimal Algorithms for Symmetric Linear Algebra Kernels.
Proceedings of the SPAA '22: 34th ACM Symposium on Parallelism in Algorithms and Architectures, Philadelphia, PA, USA, July 11, 2022

Symmetric Block-Cyclic Distribution: Fewer Communications Leads to Faster Dense Cholesky Factorization.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Proposed Consistent Exception Handling for the BLAS and LAPACK.
Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022

2021
Low synchronization Gram-Schmidt and generalized minimal residual algorithms.
Numer. Linear Algebra Appl., 2021

2020
Automated derivation of parametric data movement lower bounds for affine programs.
Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020

A Comparison of Several Fault-Tolerance Methods for the Detection and Correction of Floating-Point Errors in Matrix-Matrix Multiplication.
Proceedings of the Euro-Par 2020: Parallel Processing Workshops, 2020

A Makespan Lower Bound for the Tiled Cholesky Factorization Based on ALAP Schedule.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

2018
Low synchronization GMRES algorithms.
CoRR, 2018

2017
Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Fast Parallel Randomized QR with Column Pivoting Algorithms for Reliable Low-Rank Matrix Approximations.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

2016
A backward/forward recovery approach for the preconditioned conjugate gradient method.
J. Comput. Sci., 2016

Bidiagonalization with Parallel Tiled Algorithms.
CoRR, 2016

2015
Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers.
J. Parallel Distributed Comput., 2015

A Makespan Lower Bound for the Scheduling of the Tiled Cholesky Factorization based on ALAP scheduling.
CoRR, 2015

2014
Designing LU-QR Hybrid Solvers for Performance and Stability.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms.
ACM Trans. Math. Softw., 2013

Hierarchical QR factorization algorithms for multi-core clusters.
Parallel Comput., 2013

A Greedy Algorithm for Optimally Pipelining a Reduction.
CoRR, 2013

Topic 10: Parallel Numerical Algorithms - (Introduction).
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012
Communication-optimal Parallel and Sequential QR and LU Factorizations.
SIAM J. Sci. Comput., 2012

Flexible Variants of Block Restarted GMRES Methods with Application to Geophysics.
SIAM J. Sci. Comput., 2012



Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011
Any admissible cycle-convergence behavior is possible for restarted GMRES at its initial cycles.
Numer. Linear Algebra Appl., 2011

QCG-OMPI: MPI applications on grids.
Future Gener. Comput. Syst., 2011

Tiled QR factorization algorithms.
Proceedings of the Conference on High Performance Computing Networking, 2011

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

LU factorization for accelerator-based systems.
Proceedings of the 9th IEEE/ACS International Conference on Computer Systems and Applications, 2011

2010
Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion.
ACM Trans. Math. Softw., 2010

The Cycle-Convergence of Restarted GMRES for Normal Matrices Is Sublinear.
SIAM J. Sci. Comput., 2010

A Critical Path Approach to Analyzing Parallelism of Algorithmic Variants. Application to Cholesky Inversion
CoRR, 2010

Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

QR factorization of tall and skinny matrices in a grid computing environment.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

2009
A class of parallel tiled linear algebra algorithms for multicore architectures.
Parallel Comput., 2009

Computing the conditioning of the components of a linear least-squares solution.
Numer. Linear Algebra Appl., 2009

Algorithm-based fault tolerance applied to high performance computing.
J. Parallel Distributed Comput., 2009

The Problem With the Linpack Benchmark 1.0 Matrix Generator.
Int. J. High Perform. Comput. Appl., 2009

Accelerating scientific computations with mixed precision algorithms.
Comput. Phys. Commun., 2009

2008
Algorithmic Based Fault Tolerance Applied to High Performance Computing
CoRR, 2008

Communication-avoiding parallel and sequential QR factorizations
CoRR, 2008

Parallel tiled QR factorization for multicore architectures.
Concurr. Comput. Pract. Exp., 2008

2007
Prospectus for a Dense Linear Algebra Software Library.
Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007

Recovery Patterns for Iterative Methods in a Parallel Unstable Environment.
SIAM J. Sci. Comput., 2007

Convergence in Backward Error of Relaxed GMRES.
SIAM J. Sci. Comput., 2007

Performance Optimization and Modeling of Blocked Sparse Kernels.
Int. J. High Perform. Comput. Appl., 2007

Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems.
Int. J. High Perform. Comput. Appl., 2007

A distributed packed storage for large dense parallel in-core calculations.
Concurr. Comput. Pract. Exp., 2007

Advanced MPI Programming.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

2006
A note on the error analysis of classical Gram-Schmidt.
Numerische Mathematik, 2006

Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures.
Int. J. Comput. Sci. Eng., 2006

Self-adapting numerical software (SANS) effort.
IBM J. Res. Dev., 2006

Tools and techniques for performance - Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems).
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Recent Advances in Dense Linear Algebra: Minisymposium Abstract.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Prospectus for the Next LAPACK and ScaLAPACK Libraries.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

The Impact of Multicore on Math Software.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Exploiting Mixed Precision Floating Point Hardware in Scientific Computations.
Proceedings of the High Performance Computing and Grids in Action, 2006

Parallel Linear Algebra Software.
Proceedings of the Parallel Processing for Scientific Computing, 2006

2005
Algorithm 842: A set of GMRES routines for real and complex arithmetics on high performance computers.
ACM Trans. Math. Softw., 2005

Rounding error analysis of the classical Gram-Schmidt orthogonalization process.
Numerische Mathematik, 2005

Hash Functions for Datatype Signatures in MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Fault tolerant high performance computing by a coding approach.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Comparison of Nonlinear Conjugate-Gradient Methods for Computing the Electronic Properties of Nanostructure Architectures.
Proceedings of the Computational Science, 2005

2004
A Rank-<i>k</i> Update Procedure for Reorthogonalizing the Orthogonal Factor from Modified Gram-Schmidt.
SIAM J. Matrix Anal. Appl., 2004

2003
A Robust Criterion for the Modified Gram-Schmidt Algorithm with Selective Reorthogonalization.
SIAM J. Sci. Comput., 2003


  Loading...