Stanimire Tomov

Orcid: 0000-0002-5937-7959

Affiliations:
  • University of Tennessee, Knoxville, TN, USA


According to our database1, Stanimire Tomov authored at least 176 papers between 2004 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

PAQR: Pivoting Avoiding QR factorization.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022
Extending MAGMA Portability with OneAPI.
Proceedings of the 9th Workshop on Accelerator Programming Using Directives, 2022

Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Performance Analysis of Parallel FFT on Large Multi-GPU Systems.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Batch QR Factorization on GPUs: Design, Optimization, and Tuning.
Proceedings of the Computational Science - ICCS 2022, 2022

GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines.
ACM Trans. Math. Softw., 2021

GPU algorithms for Efficient Exascale Discretizations.
Parallel Comput., 2021

libCEED: Fast algebra for high-order element-based discretizations.
J. Open Source Softw., 2021

Translational process: Mathematical software perspective.
J. Comput. Sci., 2021

Efficient exascale discretizations: High-order finite element methods.
Int. J. High Perform. Comput. Appl., 2021

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic.
Int. J. High Perform. Comput. Appl., 2021

Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems.
IEEE Access, 2021

Scalability Issues in FFT Computation.
Proceedings of the Parallel Computing Technologies, 2021

A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Accelerating Multi - Process Communication for Parallel 3-D FFT.
Proceedings of the Workshop on Exascale MPI, 2021

2020
Load-balancing Sparse Matrix Vector Product Kernels on GPUs.
ACM Trans. Parallel Comput., 2020

Matrix multiplication on batches of small matrices in half and half-complex precisions.
J. Parallel Distributed Comput., 2020

MAGMA templates for scalable linear algebra on emerging architectures.
Int. J. High Perform. Comput. Appl., 2020

A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic.
CoRR, 2020

Reducing the amount of out-of-core data access for GPU-accelerated randomized SVD.
Concurr. Comput. Pract. Exp., 2020

Integrating Deep Learning in Domain Sciences at Exascale.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020

High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs.
Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020

Asynchronous SGD for DNN training on Shared-memory Parallel Architectures.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

heFFTe: Highly Efficient FFT for Exascale.
Proceedings of the Computational Science - ICCS 2020, 2020

Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs.
Proceedings of the Computational Science - ICCS 2020, 2020

Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

2019
Solving Linear Diophantine Systems on Parallel Architectures.
IEEE Trans. Parallel Distributed Syst., 2019

Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices.
Parallel Comput., 2019

Evaluation of directive-based performance portable programming models.
Int. J. High Perform. Comput. Netw., 2019

Investigating power capping toward energy-efficient scientific applications.
Concurr. Comput. Pract. Exp., 2019

MagmaDNN: Accelerated Deep Learning Using MAGMA.
Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), 2019

openDIEL: A Parallel Workflow Engine and Data Analytics Framework.
Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), 2019

Hands-On Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments.
Proceedings of the High Performance Computing, 2019

MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing.
Proceedings of the High Performance Computing, 2019

Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs.
Proceedings of the 10th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2019

Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Progressive Optimization of Batched LU Factorization on GPUs.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

2018
A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations.
IEEE Trans. Parallel Distributed Syst., 2018

Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs.
IEEE Trans. Parallel Distributed Syst., 2018

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale.
SIAM Rev., 2018

Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs.
Parallel Comput., 2018

Accelerating the SVD bi-diagonalization of a batch of small matrices using GPUs.
J. Comput. Sci., 2018

Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures.
J. Comput. Sci., 2018

On Deep Neural Networks for Detecting Heart Disease.
CoRR, 2018

Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers.
Proceedings of the International Conference for High Performance Computing, 2018

Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques.
Proceedings of the Computational Science - ICCS 2018, 2018

Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Optimizing the Fast Fourier Transform Using Mixed Precision on Tensor Core Hardware.
Proceedings of the 25th IEEE International Conference on High Performance Computing Workshops, 2018

2017
Fast Cholesky factorization on GPUs for batch and native modes in MAGMA.
J. Comput. Sci., 2017

On the performance and energy efficiency of sparse linear algebra on GPUs.
Int. J. High Perform. Comput. Appl., 2017

Structure-Aware Linear Solver for Realtime Convex Optimization for Embedded Systems.
IEEE Embed. Syst. Lett., 2017

With Extreme Computing, the Rules Have Changed.
Comput. Sci. Eng., 2017

Non-GPU-resident symmetric indefinite factorization.
Concurr. Comput. Pract. Exp., 2017

Solving dense symmetric indefinite systems using GPUs.
Concurr. Comput. Pract. Exp., 2017

A Framework for Out of Memory SVD Algorithms.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

Investigating half precision arithmetic to accelerate dense linear system solvers.
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017

High-performance Cholesky factorization for GPU-only execution.
Proceedings of the General Purpose GPUs, 2017

Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs.
Proceedings of the International Conference on Supercomputing, 2017

Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices.
Proceedings of the International Conference on Computational Science, 2017

Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures.
Proceedings of the International Conference on Computational Science, 2017

Out of memory SVD solver for big data.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Sampling algorithms to update truncated SVD.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Bringing High Performance Computing to Big Data Algorithms.
Proceedings of the Handbook of Big Data Technologies, 2017

2016
Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU.
ACM Trans. Math. Softw., 2016

Linear algebra software for large-scale accelerated multicore computing.
Acta Numer., 2016

Performance, Design, and Autotuning of Batched GEMM for GPUs.
Proceedings of the High Performance Computing - 31st International Conference, 2016

Towards Achieving Performance Portability Using Directives for Accelerators.
Proceedings of the Third Workshop on Accelerator Programming Using Directives, 2016

Heterogeneous Streaming.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs.
Proceedings of the International Conference on Computational Science 2016, 2016

High-Performance Tensor Contractions for GPUs.
Proceedings of the International Conference on Computational Science 2016, 2016

LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

High-Performance Matrix-Matrix Multiplications of Very Small Matrices.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015
Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems.
Supercomput. Front. Innov., 2015

Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations.
Sci. Program., 2015

HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.
Sci. Program., 2015

Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs.
SIAM J. Sci. Comput., 2015

Batched matrix computations on hardware accelerators based on GPUs.
Int. J. High Perform. Comput. Appl., 2015

Acceleration of GPU-based Krylov solvers via data transfer reduction.
Int. J. High Perform. Comput. Appl., 2015

On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors.
Proceedings of the High Performance Computing - 30th International Conference, 2015

A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations.
Proceedings of the High Performance Computing - 30th International Conference, 2015

Performance analysis and design of a hessenberg reduction using stabilized blocked elementary transformations for new architectures.
Proceedings of the Symposium on High Performance Computing, 2015

Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product.
Proceedings of the Symposium on High Performance Computing, 2015

Mixed-precision block gram Schmidt orthogonalization.
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015

Efficient implementation of quantum materials simulations on distributed CPU-GPU systems.
Proceedings of the International Conference for High Performance Computing, 2015

Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs.
Proceedings of the International Conference for High Performance Computing, 2015

Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators.
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015

Optimization for performance and energy for batched matrix computations on GPUs.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Towards batched linear solvers on accelerated hardware platforms.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Energy efficiency and performance frontiers for sparse computations on GPU supercomputers.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

Performance Analysis and Optimisation of Two-sided Factorization Algorithms for Heterogeneous Platform.
Proceedings of the International Conference on Computational Science, 2015

MAGMA embedded: Towards a dense linear algebra library for energy efficient extreme computing.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015

Flexible Linear Algebra Development and Scheduling with Cholesky Factorization.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

2014
Model-Driven One-Sided Factorizations on Multicore Accelerated Systems.
Supercomput. Front. Innov., 2014

A novel hybrid CPU-GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks.
Int. J. High Perform. Comput. Appl., 2014

Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems.
Concurr. Comput. Pract. Exp., 2014

Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Self-adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Deflation strategies to improve the convergence of communication-avoiding GMRES.
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster.
Proceedings of the International Conference for High Performance Computing, 2014

Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors.
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014

clMAGMA: high performance dense linear algebra with OpenCL.
Proceedings of the International Workshop on OpenCL, 2014

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Hybrid Multi-elimination ILU Preconditioners on GPUs.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Optimizing Krylov Subspace Solvers on Graphics Processing Units.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

A Fast Batched Cholesky Factorization on a GPU.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Access-averse framework for computing low-rank matrix approximations.
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

Accelerating Numerical Dense Linear Algebra Calculations with GPUs.
Proceedings of the Numerical Computations with GPUs, 2014

2013
Accelerating Linear System Solutions Using Randomization Techniques.
ACM Trans. Math. Softw., 2013

A block-asynchronous relaxation method for graphics processing units.
J. Parallel Distributed Comput., 2013

Soft error resilient QR factorization for hybrid system with GPGPU.
J. Comput. Sci., 2013

Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.
Proceedings of the Parallel Processing and Applied Mathematics, 2013

Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication.
Proceedings of the International Conference on Supercomputing, 2013

2012
Autotuning GEMM Kernels for the Fermi GPU.
IEEE Trans. Parallel Distributed Syst., 2012

Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems.
SIAM J. Sci. Comput., 2012

One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators.
Proceedings of the International Conference on Computational Science, 2012

A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines.
Proceedings of the International Conference on Computational Science, 2012

Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems.
Proceedings of the International Conference on Computational Science, 2012

From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming.
Parallel Comput., 2012

A hybrid Hermitian general eigenvalue solver
CoRR, 2012

Poster: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012



Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems.
Proceedings of the International Conference on Supercomputing, 2012

Scalable Dense Linear Algebra on Heterogeneous Hardware.
Proceedings of the Transition of HPC Towards Exascale Computing, 2012

Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Dense Linear Algebra on Accelerated Multicore Hardware.
Proceedings of the High-Performance Scientific Computing - Algorithms and Applications., 2012

2011
Fully Empirical Autotuned QR Factorization For Multicore Architectures
CoRR, 2011

Optimizing symmetric dense matrix-vector multiplication on GPUs.
Proceedings of the Conference on High Performance Computing Networking, 2011

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs.
Proceedings of the International Conference on Parallel Processing, 2011

Introduction.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

LU factorization for accelerator-based systems.
Proceedings of the 9th IEEE/ACS International Conference on Computer Systems and Applications, 2011

2010
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing.
Parallel Comput., 2010

Towards dense linear algebra for hybrid GPU accelerated manycore systems.
Parallel Comput., 2010

An Improved Magma Gemm For Fermi Graphics Processing Units.
Int. J. High Perform. Comput. Appl., 2010

Accelerating GPU Kernels for Dense Linear Algebra.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

Dense linear algebra solvers for multicore with GPU accelerators.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Mixed-Tool Performance Analysis on Hybrid Multicore Architectures.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Dense Linear Algebra for Hybrid GPU-Based Systems.
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

BLAS for GPUs.
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

2009
Accelerating scientific computations with mixed precision algorithms.
Comput. Phys. Commun., 2009

Bulk based preconditioning for quantum dot computations.
Proceedings of the 2009 ACM Symposium on Applied Computing (SAC), 2009

A Note on Auto-tuning GEMM for GPUs.
Proceedings of the Computational Science, 2009

2008
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy.
ACM Trans. Math. Softw., 2008

State-of-the-art eigensolvers for electronic structure calculations of large scale nano-systems.
J. Comput. Phys., 2008

2007
Prospectus for a Dense Linear Algebra Software Library.
Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007

The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot.
J. Comput. Phys., 2007

2006
Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures.
Int. J. Comput. Sci. Eng., 2006

Prospectus for the Next LAPACK and ScaLAPACK Libraries.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

The Impact of Multicore on Math Software.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Exploiting Mixed Precision Floating Point Hardware in Scientific Computations.
Proceedings of the High Performance Computing and Grids in Action, 2006

2005
Explicit and Averaging A Posteriori Error Estimates for Adaptive Finite Volume Methods.
SIAM J. Numer. Anal., 2005

Benchmarking and implementation of probability-based simulations on programmable graphics cards.
Comput. Graph., 2005

Comparison of Nonlinear Conjugate-Gradient Methods for Computing the Electronic Properties of Nanostructure Architectures.
Proceedings of the Computational Science, 2005

2004
Interactive visualization of higher dimensional data in a multiview environment
CoRR, 2004

Application of interactive parallel visualization for commodity-based clusters using visualization APIs.
Comput. Graph., 2004

Toward a Systems Biology Software Toolkit.
Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems (CBMS 2004), 2004


  Loading...