We stand with Ukraine

We stand with Ukraine

Shenggan Cheng

Orcid: 0000-0002-7966-2941

According to our database¹, Shenggan Cheng authored at least 27 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

DiT-HC: Enabling Efficient Training of Visual Generation Model DiT on HPC-oriented CPU Cluster.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, January, 2026

HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism.

[DOI]

,

,

,

,

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

2025

Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers.

[DOI]

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

SeedLoRA: A Fusion Approach to Efficient LLM Fine-Tuning.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Concerto: Automatic Communication Optimization and Scheduling for Large-Scale Deep Learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024

WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem.

[DOI]

,

,

,

,

,

,

CoRR, 2024

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers.

[DOI]

,

,

,

,

,

CoRR, 2024

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices.

[DOI]

,

,

,

,

,

CoRR, 2024

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Liger: Interleaving Intra- and Inter-Operator Parallelism for Distributed Large Model Inference.

[DOI]

,

,

,

,

,

,

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

HeteGen: Efficient Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices.

[DOI]

,

,

,

,

,

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

AutoChunk: Automated Activation Chunk for Memory-Efficient Deep Learning Inference.

[DOI]

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

ATP: Adaptive Tensor Parallelism for Foundation Models.

[DOI]

,

,

,

CoRR, 2023

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency.

[DOI]

,

,

,

Proceedings of the International Conference for High Performance Computing, 2023

2022

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours.

[DOI]

,

,

,

,

,

,

CoRR, 2022

2021

tcFFT: Accelerating Half-Precision FFT through Tensor Cores.

[DOI]

,

,

CoRR, 2021

tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores.

[DOI]

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020

HMS-Net: Hierarchical Multi-Scale Sparsity-Invariant Network for Sparse Depth Completion.

[DOI]

,

,

,

,

,

IEEE Trans. Image Process., 2020

FTL: A Universal Framework for Training Low-Bit DNNs via Feature Transfer.

[DOI]

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

CUBE - Towards an Optimal Scaling of Cosmological N-body Simulations.

[DOI]

,

,

,

,

,

Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

Loading...