Zhiquan Lai

Orcid: 0000-0002-3458-4732

According to our database1, Zhiquan Lai authored at least 41 papers between 2012 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training.
IEEE Trans. Parallel Distributed Syst., April, 2024

2023
Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures.
IEEE Trans. Computers, December, 2023

A Survey on Auto-Parallelism of Large-Scale Deep Learning Training.
IEEE Trans. Parallel Distributed Syst., August, 2023

Merak: An Efficient Distributed DNN Training Framework With Automated 3D Parallelism for Giant Foundation Models.
IEEE Trans. Parallel Distributed Syst., May, 2023

Hierarchical Adaptive Pooling by Capturing High-Order Dependency for Graph Representation Learning.
IEEE Trans. Knowl. Data Eng., April, 2023

Compressed Collective Sparse-Sketch for Distributed Data-Parallel Training of Deep Learning Models.
IEEE J. Sel. Areas Commun., April, 2023

Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training.
CoRR, 2023

CD-Sched: An Automated Scheduling Framework for Accelerating Neural Network Training on Shared Memory CPU-DSP Platforms.
Proceedings of the 2023 International Conference on Power, 2023

Efficient Large Models Fine-tuning on Commodity Servers via Memory-balanced Pipeline Parallelism.
Proceedings of the IEEE International Conference on High Performance Computing & Communications, 2023

Rethinking the Distributed DNN Training Cluster Design from the Cost-effectiveness View.
Proceedings of the IEEE International Conference on High Performance Computing & Communications, 2023

Communication Analysis for Multidimensional Parallel Training of Large-scale DNN Models.
Proceedings of the IEEE International Conference on High Performance Computing & Communications, 2023

Auto-Divide GNN: Accelerating GNN Training with Subgraph Division.
Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE Models.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022
DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation.
CoRR, 2022

BRGraph: An efficient graph neural network training system by reusing batch data on GPU.
Concurr. Comput. Pract. Exp., 2022

Accelerating Sample-based GNN Training by Feature Caching on GPUs.
Proceedings of the 7th IEEE International Conference on Smart Cloud, SmartCloud 2022, 2022

SCGraph: Accelerating Sample-based GNN Training by Staged Caching of Features on GPUs.
Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2022

EmbRace: Accelerating Sparse Communication for Distributed Training of Deep Neural Networks.
Proceedings of the 51st International Conference on Parallel Processing, 2022

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

HPH: Hybrid Parallelism on Heterogeneous Clusters for Accelerating Large-scale DNNs Training.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
Coordinative Scheduling of Computation and Communication in Data-Parallel Systems.
IEEE Trans. Computers, 2021

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks.
CoRR, 2021

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning.
CoRR, 2021

PCGraph: Accelerating GNN Inference on Large Graphs via Partition Caching.
Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, September 30, 2021

Accelerate Graph Neural Network Training by Reusing Batch Data on GPUs.
Proceedings of the IEEE International Performance, 2021

Hippie: A Data-Paralleled Pipeline Approach to Improve Memory-Efficiency and Scalability for Large DNN Training.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

HMA: An Efficient Training Method for NLP Models.
Proceedings of the ICIAI 2021: 2021 the 5th International Conference on Innovation in Artificial Intelligence, 2021

Prediction of the Cyanobacteria Coverage in Time-series Images based on Convolutional Neural Network.
Proceedings of the ICCCV 2021: 4th International Conference on Control and Computer Vision, Macau, SAR, China, August 13, 2021

2PGraph: Accelerating GNN Training over Large Graphs on GPU Clusters.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

CASQ: Accelerate Distributed Deep Learning with Sketch-Based Gradient Quantization.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
ADMMiRNN: Training RNN with Stable Convergence via an Efficient ADMM Approach.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2020

Poster Abstract: Model Average-based Distributed Training for Sparse Deep Neural Networks.
Proceedings of the 39th IEEE Conference on Computer Communications, 2020

2019
HPDL: Towards a General Framework for High-performance Distributed Deep Learning.
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019

2017
PoweRock: Power Modeling and Flexible Dynamic Power Management for Many-Core Architectures.
IEEE Syst. J., 2017

A Two-Tiered Defence of Techniques to Prevent SQL Injection Attacks.
Proceedings of the Innovative Mobile and Internet Services in Ubiquitous Computing, 2017

2015
Latency-aware DVFS for efficient power state transitions on many-core architectures.
J. Supercomput., 2015

2014
A Power Modelling Approach for Many-Core Architectures.
Proceedings of the 2014 10th International Conference on Semantics, 2014

Efficient DVFS to Prevent Hard Faults for Many-Core Architectures.
Proceedings of the Information and Communication Technology, 2014

Rhymes: A shared virtual memory system for non-coherent tiled many-core architectures.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2012
Mining of Attack Models in IDS Alerts from Network Backbone by a Two-stage Clustering Method.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012


  Loading...