We stand with Ukraine

We stand with Ukraine

Yang You

Orcid: 0000-0003-2816-4384

Affiliations:

National University of Singapore
UC Berkeley, USA (PhD 2020)

According to our database¹, Yang You authored at least 202 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2026

DyDiT++: Diffusion Transformers With Timestep and Spatial Dynamics for Efficient Visual Generation.

[DOI]

,

,

,

,

,

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., May, 2026

On-Policy Adversarial Flow Distillation for Autoregressive Video Generation.

[DOI]

,

,

,

,

,

,

CoRR, May, 2026

PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling.

[DOI]

,

,

,

,

Chengruidong Zhang

,

,

,

,

,

CoRR, March, 2026

CAMEL: Confidence-Gated Reflection for Reward Modeling.

[DOI]

,

,

,

,

,

,

CoRR, February, 2026

K-Sort Eval: Efficient Preference Evaluation for Visual Generation via Corrected VLM-as-a-Judge.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, February, 2026

HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism.

[DOI]

,

,

,

,

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

FOCUS: Efficient Keyframe Selection for Long Video Understanding.

[DOI]

,

,

,

,

,

,

CoRR, October, 2025

Unbiased Gradient Low-Rank Projection.

[DOI]

,

,

,

,

CoRR, October, 2025

POME: Post Optimization Model Edit via Muon-style Projection.

[DOI]

,

,

,

,

,

,

CoRR, October, 2025

On-the-Fly Data Augmentation via Gradient-Guided and Sample-Aware Influence Estimation.

[DOI]

,

,

,

,

,

,

,

,

CoRR, October, 2025

ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning.

[DOI]

Tomoyuki Kagaya

,

Subramanian Lakshmi

,

,

Thong Jing Yuan

,

Jayashree Karlekar

,

,

Natsuki Murakami

,

,

CoRR, September, 2025

Memory Transfer Planning: LLM-driven Context-Aware Code Adaptation for Robot Manipulation.

[DOI]

Tomoyuki Kagaya

,

Subramanian Lakshmi

,

,

Thong Jing Yuan

,

Jayashree Karlekar

,

,

Natsuki Murakami

,

,

CoRR, September, 2025

RAPID<sup>3</sup>: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, August, 2025

MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE.

[DOI]

,

,

,

,

,

CoRR, July, 2025

SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

DD-Ranking: Rethinking the Evaluation of Dataset Distillation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

David Junhao Zhang

,

,

,

,

,

,

,

,

,

Joey Tianyi Zhou

,

,

,

,

,

,

,

,

,

,

George Cazenavette

,

,

,

,

,

,

,

,

,

,

Mike Zheng Shou

,

,

,

Baharan Mirzasoleiman

,

,

Konstantinos N. Plataniotis

,

,

,

,

CoRR, May, 2025

DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation.

[DOI]

,

,

,

,

,

,

,

,

CoRR, April, 2025

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

Dynamic Vision Mamba.

[DOI]

,

,

,

,

,

,

,

,

Konstantinos N. Plataniotis

,

,

,

CoRR, April, 2025

PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2025

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2025

KMT-PLL: K-Means Cross-Attention Transformer for Partial Label Learning.

[DOI]

,

,

,

,

,

IEEE Trans. Neural Networks Learn. Syst., February, 2025

Region-Adaptive Sampling for Diffusion Transformers.

[DOI]

,

,

Chengruidong Zhang

,

,

,

,

CoRR, February, 2025

Enhance-A-Video: Better Generated Video for Free.

[DOI]

,

,

,

,

,

,

,

CoRR, February, 2025

CDIO: Cross-Domain Inference Optimization with Resource Preference Prediction for Edge-Cloud Collaboration.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, February, 2025

Recurrent Diffusion for Large-Scale Parameter Generation.

[DOI]

,

,

,

CoRR, January, 2025

WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training.

[DOI]

,

,

,

,

,

Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

Neural-Driven Image Editing.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training.

[DOI]

,

,

,

,

,

,

,

,

,

Zhangyang (Atlas) Wang

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Scaling Up Parameter Generation: A Recurrent Diffusion Approach.

[DOI]

,

,

,

Konstantin Schürholt

,

Zhangyang (Atlas) Wang

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights.

[DOI]

,

,

,

,

,

,

,

,

Konstantin Schürholt

,

,

Michael M. Bronstein

,

,

Zhangyang (Atlas) Wang

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers.

[DOI]

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Info-Coevolution: An Efficient Framework for Data Model Coevolution.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training.

[DOI]

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

SeedLoRA: A Fusion Approach to Efficient LLM Fine-Tuning.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Unsupervised Learning for Class Distribution Mismatch.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Real-Time Video Generation with Pyramid Attention Broadcast.

[DOI]

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Dynamic Diffusion Transformer.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MixEval-X: Any-to-any Evaluations from Real-world Data Mixture.

[DOI]

,

,

Deepanway Ghosal

,

,

David Junhao Zhang

,

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios.

[DOI]

,

,

,

,

,

Ramakrishna Vedantam

,

Konstantinos N. Plataniotis

,

Alexander Hauptmann

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Concerto: Automatic Communication Optimization and Scheduling for Large-Scale Deep Learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

DavIR: Data Selection via Implicit Reward for Large Language Models.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Distributed and Joint Evidential K-Nearest Neighbor Classification.

[DOI]

,

,

IEEE Trans. Knowl. Data Eng., November, 2024

Scalable Evidential K-Nearest Neighbor Classification on Big Data.

[DOI]

,

,

IEEE Trans. Big Data, June, 2024

Self-filling evidential clustering for partial multi-view data.

[DOI]

,

Expert Syst. Appl., March, 2024

Sparse Reconstructive Evidential Clustering for Multi-View Data.

[DOI]

,

IEEE CAA J. Autom. Sinica, February, 2024

Open-Sora: Democratizing Efficient Video Production for All.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Boosting LLM via Learning from Data Iteratively and Selectively.

[DOI]

,

,

,

,

,

CoRR, 2024

Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training.

[DOI]

,

,

,

,

,

,

,

Tanmay Rajpurohit

,

Shanmukha Ramakrishna Vedantam

,

,

,

CoRR, 2024

EnvBridge: Bridging Diverse Environments with Cross-Environment Knowledge Transfer for Embodied AI.

[DOI]

Tomoyuki Kagaya

,

,

Thong Jing Yuan

,

Subramanian Lakshmi

,

Jayashree Karlekar

,

,

Natsuki Murakami

,

,

,

,

CoRR, 2024

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures.

[DOI]

,

,

Deepanway Ghosal

,

,

David Junhao Zhang

,

,

,

,

,

,

,

,

CoRR, 2024

Visual Perception in Text Strings.

[DOI]

,

,

,

,

,

Bill Yuchen Lin

,

CoRR, 2024

Prioritize Alignment in Dataset Distillation.

[DOI]

,

,

,

,

,

,

,

,

Konstantinos N. Plataniotis

,

,

CoRR, 2024

More Than Positive and Negative: Communicating Fine Granularity in Medical Diagnosis.

[DOI]

,

,

,

,

CoRR, 2024

Conditional LoRA Parameter Generation.

[DOI]

,

,

,

,

,

,

CoRR, 2024

WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem.

[DOI]

,

,

,

,

,

,

CoRR, 2024

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Zhaoxiang Zhang

,

,

,

,

,

CoRR, 2024

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers.

[DOI]

,

,

,

,

,

CoRR, 2024

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices.

[DOI]

,

,

,

,

,

CoRR, 2024

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning.

[DOI]

,

,

,

,

,

CoRR, 2024

Neural Network Diffusion.

[DOI]

,

,

,

,

,

,

CoRR, 2024

Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching.

[DOI]

,

,

,

,

,

,

,

,

Joey Tianyi Zhou

,

CoRR, 2024

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents.

[DOI]

Tomoyuki Kagaya

,

Thong Jing Yuan

,

,

Jayashree Karlekar

,

,

,

,

,

CoRR, 2024

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Must: Maximizing Latent Capacity of Spatial Transcriptomics Data.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization.

[DOI]

,

,

,

,

Proceedings of the ACM on Web Conference 2024, 2024

FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

HeteGen: Efficient Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices.

[DOI]

,

,

,

,

,

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

The Snowflake Hypothesis: Training and Powering GNN with One Node One Receptive Field.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Single Domain Generalization For Scene Classification Using Style-Oriented Data Augmentation.

[DOI]

,

,

,

,

Proceedings of the IGARSS 2024, 2024

Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation.

[DOI]

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models.

[DOI]

,

,

,

,

,

Wangchunshu Zhou

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

AutoChunk: Automated Activation Chunk for Memory-Efficient Deep Learning Inference.

[DOI]

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching.

[DOI]

,

,

George Cazenavette

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?

[DOI]

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

DiM: Distilling Dataset into Generative Model.

[DOI]

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Dataset Growth.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Efficient Dataset Distillation via Minimax Diffusion.

[DOI]

,

,

Vyacheslav Kungurtsev

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks.

[DOI]

,

,

Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

Summarizing Stream Data for Memory-Constrained Online Continual Learning.

[DOI]

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Adaptive evidential <i>K</i>-NN classification: Integrating neighborhood search and feature weighting.

[DOI]

,

,

,

Inf. Sci., November, 2023

A Sparse Reconstructive Evidential K-Nearest Neighbor Classifier for High-Dimensional Data.

[DOI]

,

,

,

,

IEEE Trans. Knowl. Data Eng., June, 2023

Multitask Learning for Visual Question Answering.

[DOI]

,

,

,

,

,

IEEE Trans. Neural Networks Learn. Syst., March, 2023

Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management.

[DOI]

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2023

MLLMs-Augmented Visual-Language Representation Learning.

[DOI]

,

,

,

,

,

Mike Zheng Shou

,

,

CoRR, 2023

DREAM+: Efficient Dataset Distillation by Bidirectional Representative Matching.

[DOI]

,

,

,

,

,

,

CoRR, 2023

LoBaSS: Gauging Learnability in Supervised Fine-tuning Data.

[DOI]

,

,

,

,

,

,

CoRR, 2023

Let's reward step by step: Step-Level reward model as the Navigators for Reasoning.

[DOI]

,

,

,

,

,

,

CoRR, 2023

Can pre-trained models assist in dataset distillation?

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2023

Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch.

[DOI]

,

,

,

,

,

,

CoRR, 2023

Color Prompting for Data-Free Continual Unsupervised Domain Adaptive Person Re-Identification.

[DOI]

,

,

,

,

,

CoRR, 2023

The Snowflake Hypothesis: Training Deep GNN with One Node One Receptive field.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2023

Learning Referring Video Object Segmentation from Weak Annotation.

[DOI]

,

,

,

,

,

CoRR, 2023

Summarizing Stream Data for Memory-Restricted Online Continual Learning.

[DOI]

,

,

,

CoRR, 2023

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning.

[DOI]

,

,

,

,

,

,

CoRR, 2023

DiM: Distilling Dataset into Generative Model.

[DOI]

,

,

,

,

,

CoRR, 2023

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models.

[DOI]

,

,

,

,

,

CoRR, 2023

ATP: Adaptive Tensor Parallelism for Foundation Models.

[DOI]

,

,

,

CoRR, 2023

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency.

[DOI]

,

,

,

Proceedings of the International Conference for High Performance Computing, 2023

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Does Graph Distillation See Like Vision Dataset Counterpart?

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis.

[DOI]

,

,

Wangchunshu Zhou

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

An Efficient 2D Method for Training Super-Large Deep Learning Models.

[DOI]

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 52nd International Conference on Parallel Processing, 2023

Adaptive Computation with Elastic Input Sequence.

[DOI]

,

Valerii Likhosherstov

,

,

,

Mostafa Dehghani

,

Proceedings of the International Conference on Machine Learning, 2023

A Study on Transformer Configuration and Training Objective.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors.

[DOI]

,

,

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

One Student Knows All Experts Know: From Sparse to Dense.

[DOI]

,

,

,

,

Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention.

[DOI]

,

,

,

,

Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

Dataset Quantization.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DREAM: Efficient Dataset Distillation by Representative Matching.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID.

[DOI]

,

,

,

,

,

,

Shanghang Zhang

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CAME: Confidence-guided Adaptive Memory Efficient Optimization.

[DOI]

,

,

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Sequence Parallelism: Long Sequence Training from System Perspective.

[DOI]

,

,

Chaitanya Baranwal

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

CowClip: Reducing CTR Prediction Model Training Time from 12 Hours to 10 Minutes on 1 GPU.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

GPTR: Gestalt-Perception Transformer for Diagram Object Detection.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Weakly Supervised Learning for Textbook Question Answering.

[DOI]

,

,

,

,

,

IEEE Trans. Image Process., 2022

Distributed evidential clustering toward time series with big data issue.

[DOI]

,

,

,

Expert Syst. Appl., 2022

Elixir: Train a Large Language Model on a Small GPU Cluster.

[DOI]

,

,

,

,

CoRR, 2022

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models.

[DOI]

,

,

,

,

,

,

CoRR, 2022

Prompt Vision Transformer for Domain Generalization.

[DOI]

,

,

,

CoRR, 2022

A Frequency-aware Software Cache for Large Recommendation System Embeddings.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

FaceMAE: Privacy-Preserving Face Recognition via Masked Autoencoders.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Deeper vs Wider: A Revisit of Transformer Configuration.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Reliable Label Correction is a Good Booster When Learning with Extremely Noisy Labels.

[DOI]

,

,

,

,

,

,

CoRR, 2022

CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2022

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours.

[DOI]

,

,

,

,

,

,

CoRR, 2022

Sky Computing: Accelerating Geo-distributed Computing in Federated Learning.

[DOI]

,

,

CoRR, 2022

Crafting Better Contrastive Views for Siamese Representation Learning.

[DOI]

,

,

,

CoRR, 2022

Random Sharpness-Aware Minimization.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Handling heavy-tailed input of transformer inference on GPUs.

[DOI]

,

,

,

,

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Tesseract: Parallelize the Tensor Parallelism Efficiently.

[DOI]

,

,

,

Proceedings of the 51st International Conference on Parallel Processing, 2022

Concurrent Adversarial Learning for Large-Batch Training.

[DOI]

,

,

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

Joint Evidential $K$-Nearest Neighbor Classification.

[DOI]

,

,

,

,

Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Self-reconstructive evidential clustering for high-dimensional data.

[DOI]

,

,

,

,

,

Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

CAFE: Learning to Condense Dataset by Aligning Features.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

An Efficient Training Approach for Very Large Scale Face Recognition.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Crafting Better Contrastive Views for Siamese Representation Learning.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Towards Efficient and Scalable Sharpness-Aware Minimization.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Distributed EK-NN Classification.

[DOI]

,

,

,

Proceedings of the Belief Functions: Theory and Applications, 2022

Go Wider Instead of Deeper.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Evidential instance selection for <i>K</i>-nearest neighbor classification of big data.

[DOI]

,

,

,

,

Int. J. Approx. Reason., 2021

Large-Scale Deep Learning Optimizations: A Comprehensive Survey.

[DOI]

,

,

,

CoRR, 2021

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training.

[DOI]

,

,

,

,

,

,

,

CoRR, 2021

Sparse-MLP: A Fully-MLP Architecture with Conditional Computation.

[DOI]

,

,

,

CoRR, 2021

PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management.

[DOI]

,

,

,

,

CoRR, 2021

2.5-dimensional distributed model training.

[DOI]

,

,

,

CoRR, 2021

Maximizing Parallelism in Distributed Training for Huge Neural Networks.

[DOI]

,

,

,

CoRR, 2021

Sequence Parallelism: Making 4D Parallelism Possible.

[DOI]

,

,

,

CoRR, 2021

An Efficient Training Approach for Very Large Scale Face Recognition.

[DOI]

,

,

,

,

,

,

,

CoRR, 2021

An Efficient 2D Method for Training Super-Large Deep Learning Models.

[DOI]

,

,

,

CoRR, 2021

Communication-avoiding kernel ridge regression on parallel and distributed systems.

[DOI]

,

,

,

Richard W. Vuduc

,

CCF Trans. High Perform. Comput., 2021

Auto-Precision Scaling for Distributed Deep Learning.

[DOI]

,

,

Proceedings of the High Performance Computing - 36th International Conference, 2021

Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clusters.

[DOI]

,

,

,

Proceedings of the International Conference for High Performance Computing, 2021

Dynamic scaling for low-precision learning.

[DOI]

,

,

,

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour.

[DOI]

Arissa Wongpanich

,

,

,

,

,

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Mask Aware Network for Masked Face Recognition in the Wild.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

2020

Fast and Accurate Machine Learning on Distributed Systems and Supercomputers

[DOI]

PhD thesis, 2020

Fast LSTM by dynamic decomposition on cloud and distributed systems.

[DOI]

,

,

Samyam Rajbhandari

,

,

,

,

Knowl. Inf. Syst., 2020

How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers.

[DOI]

,

,

,

,

,

CoRR, 2020

The Limit of the Batch Size.

[DOI]

,

,

,

,

,

CoRR, 2020

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.

[DOI]

,

,

Sashank J. Reddi

,

,

,

Srinadh Bhojanapalli

,

,

,

,

Proceedings of the 8th International Conference on Learning Representations, 2020

Rethinking the Value of Asynchronous Solvers for Distributed Deep Learning.

[DOI]

Arissa Wongpanich

,

,

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

2019

Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs.

[DOI]

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2019

Reducing BERT Pre-Training Time from 3 Days to 76 Minutes.

[DOI]

,

,

,

,

,

CoRR, 2019

Large-batch training for LSTM and beyond.

[DOI]

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2019

Fast LSTM Inference by Dynamic Decomposition on Cloud Systems.

[DOI]

,

,

Samyam Rajbhandari

,

,

,

,

Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

2018

Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems.

[DOI]

,

,

,

Richard W. Vuduc

Proceedings of the 32nd International Conference on Supercomputing, 2018

ImageNet Training in Minutes.

[DOI]

,

,

,

,

Proceedings of the 47th International Conference on Parallel Processing, 2018

2017

Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines.

[DOI]

,

,

Kent Czechowski

,

,

IEEE Trans. Parallel Distributed Syst., 2017

Parallel Multiclass Support Vector Machine for Remote Sensing Data Classification on Multicore and Many-Core Architectures.

[DOI]

,

,

,

,

IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2017

Designing and implementing a heuristic cross-architecture combination for graph traversal.

[DOI]

,

,

,

J. Parallel Distributed Comput., 2017

100-epoch ImageNet Training with AlexNet in 24 Minutes.

[DOI]

,

,

,

CoRR, 2017

Scaling deep learning on GPU and knights landing clusters.

[DOI]

,

,

Proceedings of the International Conference for High Performance Computing, 2017

Runtime Data Layout Scheduling for Machine Learning Dataset.

[DOI]

,

Proceedings of the 46th International Conference on Parallel Processing, 2017

2016

Asynchronous Parallel Greedy Coordinate Descent.

[DOI]

,

,

,

,

Inderjit S. Dhillon

,

,

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2015

Scaling Support Vector Machines on modern HPC platforms.

[DOI]

,

,

Shuaiwen Leon Song

,

Amanda Peters Randles

,

Darren J. Kerbyson

,

,

,

J. Parallel Distributed Comput., 2015

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems.

[DOI]

,

,

Kenneth Czechowski

,

,

Richard W. Vuduc

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil.

[DOI]

,

,

Shuaiwen Leon Song

,

Maryam Mehri Dehnavi

,

,

,

Int. J. High Perform. Comput. Appl., 2014

MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures.

[DOI]

,

Shuaiwen Leon Song

,

,

,

Maryam Mehri Dehnavi

,

Kevin J. Barker

,

Kirk W. Cameron

,

Amanda Peters Randles

,

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

An adaptive cross-architecture combination method for graph traversal.

[DOI]

,

Shuaiwen Leon Song

,

Darren J. Kerbyson

Proceedings of the 2014 International Conference on Supercomputing, 2014

Designing a Heuristic Cross-Architecture Combination for Breadth-First Search.

[DOI]

,

,

Maryam Mehri Dehnavi

Proceedings of the 43rd International Conference on Parallel Processing, 2014

Scaling and analyzing the stencil performance on multi-core and many-core architectures.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2013

Accelerating the 3D Elastic Wave Forward Modeling on GPU and MIC.

[DOI]

,

,

,

,

,

,

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Loading...