We stand with Ukraine

We stand with Ukraine

Shengen Yan

According to our database¹, Shengen Yan authored at least 40 papers between 2012 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Proteus: Simulating the Performance of Distributed DNN Training.

[BibT_eX]

[DOI]

,

,

,

Xingcheng Zhang

,

,

,

IEEE Trans. Parallel Distributed Syst., October, 2024

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs.

[BibT_eX]

[DOI]

,

,

,

,

Matthew B. Blaschko

,

,

,

,

CoRR, 2024

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

DiTFastAttn: Attention Compression for Diffusion Transformer Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation.

[BibT_eX]

[DOI]

,

,

,

,

Widyadewi Soedarmadji

,

,

,

,

,

,

,

CoRR, 2024

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

HetHub: A Heterogeneous distributed hybrid training system for large-scale models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

A Survey on Efficient Inference for Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Xiao-Ping Zhang

,

,

CoRR, 2024

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better.

[BibT_eX]

[DOI]

,

,

,

,

Matthew B. Blaschko

,

Sergey Yekhanin

,

,

,

,

CoRR, 2024

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Evaluating Quantized Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022

NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2022

Astraea: A Fair Deep Learning Scheduler for Multi-Tenant GPU Clusters.

[BibT_eX]

[DOI]

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2022

DIESEL+: Accelerating Distributed Deep Learning Tasks on Image Datasets.

[BibT_eX]

[DOI]

,

,

IEEE Trans. Parallel Distributed Syst., 2022

GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Trans. Big Data, 2022

A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2022

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

LongTail-Bench: A Benchmark Suite for Domain-Specific Operators in Deep Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

Xingcheng Zhang

,

Proceedings of the IEEE International Symposium on Workload Characterization, 2022

EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers.

[BibT_eX]

[DOI]

,

,

,

,

,

Xingcheng Zhang

,

,

,

,

,

,

,

Proceedings of the 51st International Conference on Parallel Processing, 2022

2021

Characterization and prediction of deep learning workloads in large-scale GPU datacenters.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2021

2020

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Enabling Efficient Fast Convolution Algorithms on GPUs via MegaKernels.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Trans. Computers, 2020

DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding.

[BibT_eX]

[DOI]

,

,

Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems, 2020

Elan: Towards Generic and Efficient Elastic Training for Deep Learning.

[BibT_eX]

[DOI]

,

,

,

,

Xingcheng Zhang

,

,

Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

2019

面向GPU计算平台的归约算法的性能优化研究 (Study on Performance Optimization of Reduction Algorithm Targeting GPU Computing Platform).

[BibT_eX]

[DOI]

,

,

,

计算机科学, 2019

Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2019

A coordinated tiling and batching framework for efficient GEMM on GPUs.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

2017

Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach.

[BibT_eX]

[DOI]

,

,

Ta Nguyen Binh Duong

,

Proceedings of the 2017 IEEE International Conference on Smart Computing, 2017

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 54th Annual Design Automation Conference, 2017

2016

A Cross-Platform SpMV Framework on Many-Core Architectures.

[BibT_eX]

[DOI]

,

,

,

ACM Trans. Archit. Code Optim., 2016

Timed Dataflow: Reducing Communication Overhead for Distributed Machine Learning Systems.

[BibT_eX]

[DOI]

,

,

Ta Nguyen Binh Duong

,

Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

2014

yaSpMV: yet another SpMV framework on GPUs.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

A fast integral image generation algorithm on GPUs.

[BibT_eX]

[DOI]

,

,

Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2013

StreamScan: fast scan algorithms for GPUs without global barrier synchronization.

[BibT_eX]

[DOI]

,

,

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

CLSIFT: An Optimization Study of the Scale Invariance Feature Transform on GPUs.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

2012

An Insightful Program Performance Tuning Chain for GPU Computing.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Algorithms and Architectures for Parallel Processing, 2012

GPURoofline: A Model for Guiding Performance Optimizations on GPUs.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Loading...