Shengen Yan
Orcid: 0009-0005-3858-7972
According to our database1,
Shengen Yan
authored at least 42 papers
between 2012 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
IEEE Trans. Parallel Distributed Syst., October, 2024
CoRR, 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs.
CoRR, 2024
CoRR, 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation.
CoRR, 2024
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization.
CoRR, 2024
CoRR, 2024
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better.
CoRR, 2024
CoRR, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
2023
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
2022
NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training.
IEEE Trans. Parallel Distributed Syst., 2022
IEEE Trans. Parallel Distributed Syst., 2022
IEEE Trans. Parallel Distributed Syst., 2022
GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training.
IEEE Trans. Big Data, 2022
A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs.
CoRR, 2022
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Proceedings of the IEEE International Symposium on Workload Characterization, 2022
Proceedings of the 51st International Conference on Parallel Processing, 2022
2021
Characterization and prediction of deep learning workloads in large-scale GPU datacenters.
Proceedings of the International Conference for High Performance Computing, 2021
2020
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
IEEE Trans. Computers, 2020
DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020
Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems, 2020
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020
2019
面向GPU计算平台的归约算法的性能优化研究 (Study on Performance Optimization of Reduction Algorithm Targeting GPU Computing Platform).
计算机科学, 2019
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes.
CoRR, 2019
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
2017
Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach.
Proceedings of the 2017 IEEE International Conference on Smart Computing, 2017
Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017
2016
ACM Trans. Archit. Code Optim., 2016
Timed Dataflow: Reducing Communication Overhead for Distributed Machine Learning Systems.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016
2014
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014
2013
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013
2012
Proceedings of the Algorithms and Architectures for Parallel Processing, 2012
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012