Yida Wang

Orcid: 0000-0001-8165-840X

Affiliations:

Amazon Web Services, Inc., East Palo Alto, CA, USA
Intel Corporation, Parallel Computing Lab, Santa Clara, CA, USA
Princeton University, Department of Computer Science, NJ, USA

According to our database¹, Yida Wang authored at least 48 papers between 2015 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

UCCL-Zip: Lossless Compression Supercharged GPU Communication.

[BibT_eX]

[DOI]

CoRR, April, 2026

StreamFusion: Scalable Sequence Parallelism for Distributed Inference of Diffusion Transformers on GPUs.

[BibT_eX]

[DOI]

CoRR, January, 2026

SAS: Sparse Attention Synthesizer for Efficient Language Model Inference.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

SwiftFusion: Scalable Sequence Parallelism for Distributed Inference of Diffusion Transformers on GPUs.

[BibT_eX]

[DOI]

Proceedings of the ACM Conference on AI and Agentic Systems, 2026

Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025

Block-Diagonal LoRA for Eliminating Communication Overhead in Tensor Parallel LoRA Serving.

[BibT_eX]

[DOI]

Matthäus Kleindessner

CoRR, October, 2025

Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving.

[BibT_eX]

[DOI]

CoRR, April, 2025

DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

ScaleFusion: Scalable Inference of Spatial-Temporal Diffusion Transformers for High-Resolution Long Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

KDD 2025 Workshop on Inference Optimization for Generative AI.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

2024

Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., March, 2024

DISTMM: Accelerating Distributed Multimodal Model Training.

[BibT_eX]

[DOI]

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines.

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping.

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Inference Optimization of Foundation Models on AI Accelerators.

[BibT_eX]

[DOI]

Matthäus Kleindessner

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines.

[BibT_eX]

[DOI]

Proceedings of the Nineteenth European Conference on Computer Systems, 2024

Distributed Training of Large Language Models on AWS Trainium.

[BibT_eX]

[DOI]

Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

HLAT: High-quality Large Language Model Pre-trained on AWS Trainium.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data, 2024

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Perception and memory retrieval states are reflected in distributed patterns of background functional connectivity.

[BibT_eX]

[DOI]

Y. Peeta Li

Yida Wang

Nicholas B. Turk-Browne

Brice A. Kuhl

J. Benjamin Hutchinson

NeuroImage, August, 2023

RAF: Holistic Compilation for Deep Learning Model Training.

[BibT_eX]

[DOI]

CoRR, 2023

Decoupled Model Schedule for Deep Learning Training.

[BibT_eX]

[DOI]

CoRR, 2023

GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints.

[BibT_eX]

[DOI]

Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Training Large-scale Foundation Models on Emerging AI Chips.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

DietCode: Automatic Optimization for Dynamic Tensor Programs.

[BibT_eX]

[DOI]

Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

2021

Bring Your Own Codegen to Deep Learning Compiler.

[BibT_eX]

[DOI]

CoRR, 2021

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Lorien: Efficient Deep Learning Workloads Delivery.

[BibT_eX]

[DOI]

Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

UNIT: Unifying Tensorized Instruction Compilation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach.

[BibT_eX]

[DOI]

Animesh Jain

Shoubhik Bhattacharya

Masahiro Masuda

Vin Sharma

Yida Wang

CoRR, 2020

Is Network the Bottleneck of Distributed Training?

[BibT_eX]

[DOI]

Proceedings of the 2020 Workshop on Network Meets AI & ML, 2020

FeatGraph: a flexible and efficient backend for graph neural network systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Ansor: Generating High-Performance Tensor Programs for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

2019

Optimizing CNN Model Inference on CPUs.

[BibT_eX]

[DOI]

Proceedings of the 2019 USENIX Annual Technical Conference, 2019

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs.

[BibT_eX]

[DOI]

Proceedings of the 48th International Conference on Parallel Processing, 2019

2018

Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs.

[BibT_eX]

[DOI]

CoRR, 2018

2017

BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods.

[BibT_eX]

[DOI]

PLoS Comput. Biol., 2017

High-Performance Incremental SVM Learning on Intel<sup>®</sup> Xeon Phi™ Processors.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 32nd International Conference, 2017

2016

Large-scale analyses of functional interactions in the human brain

[BibT_eX]

[DOI]

Yida Wang

PhD thesis, 2016

Real-time full correlation matrix analysis of fMRI data.

[BibT_eX]

[DOI]

Nicholas B. Turk-Browne

Theodore L. Willke

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Enabling factor analysis on thousand-subject neuroimaging datasets.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015

Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors.

[BibT_eX]

[DOI]

Nicholas B. Turk-Browne

Theodore L. Willke

Proceedings of the International Conference for High Performance Computing, 2015

Yida Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...