Dan Huang

Orcid: 0000-0001-5582-1031

Affiliations:
  • Sun Yat-Sen University, Guangzhou, China


According to our database1, Dan Huang authored at least 48 papers between 2015 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
HPC-AI Coupling Methodology for Scientific Applications.
CoRR, July, 2025

Critique of "Productivity, Portability, Performance Data-Centric Python" by SCC Team From Sun Yat-sen University.
IEEE Trans. Parallel Distributed Syst., May, 2025

GPU acceleration for DNA sequence alignment algorithm and its application.
CCF Trans. High Perform. Comput., April, 2025

2024
Sophisticated Orchestrating Concurrent DLRM Training on CPU/GPU Platform.
IEEE Trans. Parallel Distributed Syst., November, 2024

Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method.
Int. J. Parallel Program., June, 2024

SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems.
J. Comput. Sci. Technol., March, 2024

AdaNAS: Adaptively Postprocessing With Self-Supervised Neural Architecture Search for Ensemble Rainfall Forecasts.
IEEE Trans. Geosci. Remote. Sens., 2024

Topo: Towards a fine-grained topological data processing framework on Tianhe-3 supercomputer.
J. Parallel Distributed Comput., 2024

HTDcr: a job execution framework for high-throughput computing on supercomputers.
Sci. China Inf. Sci., 2024

APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes.
Proceedings of the International Conference for High Performance Computing, 2024

Liger: Interleaving Intra- and Inter-Operator Parallelism for Distributed Large Model Inference.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Understanding the Inference Performance of Spatial Temporal Diffusion Transformer.
Proceedings of the Network and Parallel Computing, 2024

Equivariant Diffusion for Crystal Structure Prediction.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Coupling Streaming AI and Ensemble Simulations on HPC Clusters.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

Communication-Efficient Model Parallelism for Distributed In-Situ Transformer Inference.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

2023
Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs.
ACM Trans. Archit. Code Optim., December, 2023

Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor via Decoupled 3D-CNN Structure.
ACM Trans. Archit. Code Optim., September, 2023

Optimizing massively parallel sparse matrix computing on ARM many-core processor.
Parallel Comput., September, 2023

Full-Stack Optimizing Transformer Inference on ARM Many-Core CPU.
IEEE Trans. Parallel Distributed Syst., July, 2023

A Data-driven Approach to Harvesting Latent Reduced Models to Precondition Lossy Compression for Scientific Data.
IEEE Trans. Big Data, June, 2023

AdaNAS: Adaptively Post-processing with Self-supervised Neural Architecture Search for Ensemble Rainfall Forecasts.
CoRR, 2023

MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

Accelerating Inference of 3D-CNN on ARMMany-core CPU via Hierarchical Model Partition.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

Enhancing Multi-physics Coupling on ARM Many-Core Cluster.
Proceedings of the Advanced Parallel Processing Technologies, 2023

2022
Optimizing small channel 3D convolution on GPU with tensor core.
Parallel Comput., 2022

Identifying challenges and opportunities of in-memory computing on large HPC systems.
J. Parallel Distributed Comput., 2022

Enhancing Distributed In-Situ CNN Inference in the Internet of Things.
IEEE Internet Things J., 2022

Handling heavy-tailed input of transformer inference on GPUs.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Characterizing and Optimizing Transformer Inference on ARM Many-core Processor.
Proceedings of the 51st International Conference on Parallel Processing, 2022

2021
Enhancing Proportional IO Sharing on Containerized Big Data File Systems.
IEEE Trans. Computers, 2021

A Fine-grained Optimization to Winograd Convolution Based on Micro-architectural Features of CPU.
Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, September 30, 2021

Optimizing Massively Parallel Winograd Convolution on ARM Processor.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

2020
Improving the efficiency of HPC data movement on container-based virtual cluster.
CCF Trans. High Perform. Comput., 2020

A Comprehensive Study of In-Memory Computing on Large HPC Systems.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

2019
Harnessing Data Movement in Virtual Clusters for In-Situ Execution.
IEEE Trans. Parallel Distributed Syst., 2019

Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?
IEEE Trans. Computers, 2019

Identifying Latent Reduced Models to Precondition Lossy Compression.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

2018
Achieving Load Balance for Parallel Data Access on Distributed File Systems.
IEEE Trans. Computers, 2018

Performance Evaluation and Analysis for MPI-Based Data Movement in Virtual Switch Network.
Proceedings of the 2018 IEEE International Conference on Networking, 2018

2017
Energy-Aware Adaptive Restore Schemes for MLC STT-RAM Cache.
IEEE Trans. Computers, 2017

Deister: A light-weight autonomous block management in data-intensive file systems using deterministic declustering distribution.
J. Parallel Distributed Comput., 2017

SideIO: A Side I/O system framework for hybrid scientific workflow.
J. Parallel Distributed Comput., 2017

DFS-container: achieving containerized block I/O for distributed file systems.
Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017

2016
AOS: adaptive overwrite scheme for energy-efficient MLC STT-RAM cache.
Proceedings of the 53rd Annual Design Automation Conference, 2016

2015
Deister: A Light-Weight Autonomous Block Management in Data-Intensive File Systems Using Deterministic Declustering Distribution.
Proceedings of the 2015 IEEE International Conference on Smart City/SocialCom/SustainCom/DataCom/SC2 2015, 2015

Experiences in using os-level virtualization for block I/O.
Proceedings of the 10th Parallel Data Storage Workshop, 2015

Opass: Analysis and Optimization of Parallel Data Access on Distributed File Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

UNIO: A Unified I/O System Framework for Hybrid Scientific Workflow.
Proceedings of the Cloud Computing and Big Data, 2015


  Loading...