We stand with Ukraine

We stand with Ukraine

Tan Nguyen

Orcid: 0000-0003-3748-403X

Affiliations:

Lawrence Berkeley National Laboratory, Berkeley, CA, USA
University of California San Diego, La Jolla, CA, USA (PhD 2014)

According to our database¹, Tan Nguyen authored at least 23 papers between 2012 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

CAC: An asynchronous non-blocking consistency model with bounded staleness for distributed machine learning.

[DOI]

,

,

Future Gener. Comput. Syst., 2026

A Hierarchical Methodology for Hardware Design Comparison in HPC Workloads.

[DOI]

Doru-Thom Popovici

,

,

Angelos Ioannou

,

,

Dania Susanne Mosuli

,

,

,

,

Proceedings of the 2026 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2026

2025

Towards An Approach to Identify Divergences in Hardware Designs for HPC Workloads.

[DOI]

Doru-Thom Popovici

,

,

Angelos Ioannou

,

,

Dania Susanne Mosuli

,

,

,

,

CoRR, September, 2025

Uniconn: A Uniform High-Level Communication Library for Portable Multi-GPU Programming.

[DOI]

,

Sinan Ekmekçibasi

,

Khaled Z. Ibrahim

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2025

2024

Devastator: A Scalable Parallel Discrete Event Simulation Framework for Modern C++.

[DOI]

,

,

,

,

Mahesh Natarajan

,

Maximilian H. Bremer

,

Proceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2024

2023

Benefits of Optimistic Parallel Discrete Event Simulation for Network-on-Chip Simulation.

[DOI]

Maximilian H. Bremer

,

Nirmalendu Patra

,

,

Dilip Vasudevan

,

Proceedings of the 27th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications, 2023

2022

FPGA-based HPC accelerators: An evaluation on performance and energy efficiency.

[DOI]

,

,

,

Douglas Doerfler

,

Nicholas J. Wright

,

Samuel Williams

Concurr. Comput. Pract. Exp., 2022

2021

Architectural Requirements for Deep Learning Workloads in HPC Environments.

[DOI]

Khaled Z. Ibrahim

,

,

,

,

,

,

,

Nicholas J. Wright

,

Samuel Williams

Proceedings of the 2021 International Workshop on Performance Modeling, 2021

Facilitating CoDesign with Automatic Code Similarity Learning.

[DOI]

,

Erich Strohmaier

,

Proceedings of the 7th IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC, 2021

Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs.

[DOI]

Douglas Doerfler

,

Farzad Fatollahi-Fard

,

,

,

Samuel Williams

,

Nicholas J. Wright

,

Proceedings of the IWOCL'21: International Workshop on OpenCL, Munich Germany, April, 2021, 2021

2020

The Performance and Energy Efficiency Potential of FPGAs in Scientific Computing.

[DOI]

,

Samuel Williams

,

,

,

Douglas Doerfler

,

Nicholas J. Wright

Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

2019

AMReX: a framework for block-structured adaptive mesh refinement.

[DOI]

,

,

Vincent E. Beckner

,

,

Johannes P. Blaschke

,

,

,

,

,

Daniel T. Graves

,

Maximilian Katz

,

,

,

,

,

Samuel Williams

,

Michael Zingale

J. Open Source Softw., 2019

Asynchronous AMR on Multi-GPUs.

[DOI]

Muhammed Nufail Farooqi

,

,

,

,

,

Proceedings of the High Performance Computing, 2019

2018

Phase asynchronous AMR execution for productive and performant astrophysical flows.

[DOI]

Muhammed Nufail Farooqi

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2018

2017

Automatic translation of MPI source into a latency-tolerant, data-driven form.

[DOI]

,

,

Eric J. Bylaska

,

Daniel J. Quinlan

,

J. Parallel Distributed Comput., 2017

Nonintrusive AMR Asynchrony for Communication Optimization.

[DOI]

Muhammed Nufail Farooqi

,

,

,

,

,

Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016

BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework.

[DOI]

,

,

,

,

,

SIAM J. Sci. Comput., 2016

BoxLib with Tiling: An AMR Software Framework.

[DOI]

,

,

,

,

,

CoRR, 2016

TiDA: High-Level Programming Abstractions for Data Locality Management.

[DOI]

,

,

,

Muhammed Nufail Farooqi

,

,

George Michelogiannakis

,

,

Proceedings of the High Performance Computing - 31st International Conference, 2016

Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement.

[DOI]

,

,

,

,

Muhammed Nufail Farooqi

,

Proceedings of the International Conference for High Performance Computing, 2016

2015

LU Factorization: Towards Hiding Communication Overheads with a Lookahead-Free Algorithm.

[DOI]

,

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2013

A software-based dynamic-warp scheduling approach for load-balancing the Viola-Jones face detection algorithm on GPUs.

[DOI]

,

Daniel Hefenbrock

,

,

,

J. Parallel Distributed Comput., 2013

2012

Bamboo: translating MPI applications to a latency-tolerant, data-driven form.

[DOI]

,

,

Eric J. Bylaska

,

,

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Loading...