Yang You

Orcid: 0000-0003-2816-4384

Affiliations:
  • National University of Singapore
  • UC Berkeley, USA (PhD 2020)


According to our database1, Yang You authored at least 110 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Self-filling evidential clustering for partial multi-view data.
Expert Syst. Appl., March, 2024

Sparse Reconstructive Evidential Clustering for Multi-View Data.
IEEE CAA J. Autom. Sinica, February, 2024

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers.
CoRR, 2024

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices.
CoRR, 2024

Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization.
CoRR, 2024

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning.
CoRR, 2024

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents.
CoRR, 2024

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models.
CoRR, 2024

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference.
CoRR, 2024

FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

2023
Adaptive evidential <i>K</i>-NN classification: Integrating neighborhood search and feature weighting.
Inf. Sci., November, 2023

A Sparse Reconstructive Evidential K-Nearest Neighbor Classifier for High-Dimensional Data.
IEEE Trans. Knowl. Data Eng., June, 2023

Multitask Learning for Visual Question Answering.
IEEE Trans. Neural Networks Learn. Syst., March, 2023

Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management.
IEEE Trans. Parallel Distributed Syst., 2023

Efficient Dataset Distillation via Minimax Diffusion.
CoRR, 2023

DREAM+: Efficient Dataset Distillation by Bidirectional Representative Matching.
CoRR, 2023

LoBaSS: Gauging Learnability in Supervised Fine-tuning Data.
CoRR, 2023

Let's reward step by step: Step-Level reward model as the Navigators for Reasoning.
CoRR, 2023

Can pre-trained models assist in dataset distillation?
CoRR, 2023

Color Prompting for Data-Free Continual Unsupervised Domain Adaptive Person Re-Identification.
CoRR, 2023

Dataset Quantization.
CoRR, 2023

Learning Referring Video Object Segmentation from Weak Annotation.
CoRR, 2023

Summarizing Stream Data for Memory-Restricted Online Continual Learning.
CoRR, 2023

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline.
CoRR, 2023

Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models.
CoRR, 2023

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning.
CoRR, 2023

DiM: Distilling Dataset into Generative Model.
CoRR, 2023

DREAM: Efficient Dataset Distillation by Representative Matching.
CoRR, 2023

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models.
CoRR, 2023

ATP: Adaptive Tensor Parallelism for Foundation Models.
CoRR, 2023

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency.
Proceedings of the International Conference for High Performance Computing, 2023

To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

An Efficient 2D Method for Training Super-Large Deep Learning Models.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

Adaptive Computation with Elastic Input Sequence.
Proceedings of the International Conference on Machine Learning, 2023

A Study on Transformer Configuration and Training Objective.
Proceedings of the International Conference on Machine Learning, 2023

Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

One Student Knows All Experts Know: From Sparse to Dense.
Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention.
Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CAME: Confidence-guided Adaptive Memory Efficient Optimization.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Sequence Parallelism: Long Sequence Training from System Perspective.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

CowClip: Reducing CTR Prediction Model Training Time from 12 Hours to 10 Minutes on 1 GPU.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Weakly Supervised Learning for Textbook Question Answering.
IEEE Trans. Image Process., 2022

Distributed evidential clustering toward time series with big data issue.
Expert Syst. Appl., 2022

Elixir: Train a Large Language Model on a Small GPU Cluster.
CoRR, 2022

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models.
CoRR, 2022

Prompt Vision Transformer for Domain Generalization.
CoRR, 2022

A Frequency-aware Software Cache for Large Recommendation System Embeddings.
CoRR, 2022

FaceMAE: Privacy-Preserving Face Recognition via Masked Autoencoders.
CoRR, 2022

Deeper vs Wider: A Revisit of Transformer Configuration.
CoRR, 2022

Reliable Label Correction is a Good Booster When Learning with Extremely Noisy Labels.
CoRR, 2022

CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU.
CoRR, 2022

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours.
CoRR, 2022

Sky Computing: Accelerating Geo-distributed Computing in Federated Learning.
CoRR, 2022

Crafting Better Contrastive Views for Siamese Representation Learning.
CoRR, 2022

Random Sharpness-Aware Minimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Handling heavy-tailed input of transformer inference on GPUs.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Tesseract: Parallelize the Tensor Parallelism Efficiently.
Proceedings of the 51st International Conference on Parallel Processing, 2022

Concurrent Adversarial Learning for Large-Batch Training.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Joint Evidential $K$-Nearest Neighbor Classification.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Self-reconstructive evidential clustering for high-dimensional data.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

CAFE: Learning to Condense Dataset by Aligning Features.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

An Efficient Training Approach for Very Large Scale Face Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Crafting Better Contrastive Views for Siamese Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Towards Efficient and Scalable Sharpness-Aware Minimization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Distributed EK-NN Classification.
Proceedings of the Belief Functions: Theory and Applications, 2022

Go Wider Instead of Deeper.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Evidential instance selection for <i>K</i>-nearest neighbor classification of big data.
Int. J. Approx. Reason., 2021

Large-Scale Deep Learning Optimizations: A Comprehensive Survey.
CoRR, 2021

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training.
CoRR, 2021

Sparse-MLP: A Fully-MLP Architecture with Conditional Computation.
CoRR, 2021

PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management.
CoRR, 2021

2.5-dimensional distributed model training.
CoRR, 2021

Maximizing Parallelism in Distributed Training for Huge Neural Networks.
CoRR, 2021

Sequence Parallelism: Making 4D Parallelism Possible.
CoRR, 2021

An Efficient Training Approach for Very Large Scale Face Recognition.
CoRR, 2021

An Efficient 2D Method for Training Super-Large Deep Learning Models.
CoRR, 2021

Communication-avoiding kernel ridge regression on parallel and distributed systems.
CCF Trans. High Perform. Comput., 2021

Auto-Precision Scaling for Distributed Deep Learning.
Proceedings of the High Performance Computing - 36th International Conference, 2021

Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clusters.
Proceedings of the International Conference for High Performance Computing, 2021

Dynamic scaling for low-precision learning.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Mask Aware Network for Masked Face Recognition in the Wild.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

2020
Fast LSTM by dynamic decomposition on cloud and distributed systems.
Knowl. Inf. Syst., 2020

How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers.
CoRR, 2020

The Limit of the Batch Size.
CoRR, 2020

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.
Proceedings of the 8th International Conference on Learning Representations, 2020

Rethinking the Value of Asynchronous Solvers for Distributed Deep Learning.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

2019
Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs.
IEEE Trans. Parallel Distributed Syst., 2019

Reducing BERT Pre-Training Time from 3 Days to 76 Minutes.
CoRR, 2019

Large-batch training for LSTM and beyond.
Proceedings of the International Conference for High Performance Computing, 2019

Fast LSTM Inference by Dynamic Decomposition on Cloud Systems.
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

2018
Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems.
Proceedings of the 32nd International Conference on Supercomputing, 2018

ImageNet Training in Minutes.
Proceedings of the 47th International Conference on Parallel Processing, 2018

2017
Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines.
IEEE Trans. Parallel Distributed Syst., 2017

Parallel Multiclass Support Vector Machine for Remote Sensing Data Classification on Multicore and Many-Core Architectures.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2017

Designing and implementing a heuristic cross-architecture combination for graph traversal.
J. Parallel Distributed Comput., 2017

100-epoch ImageNet Training with AlexNet in 24 Minutes.
CoRR, 2017

Scaling deep learning on GPU and knights landing clusters.
Proceedings of the International Conference for High Performance Computing, 2017

Runtime Data Layout Scheduling for Machine Learning Dataset.
Proceedings of the 46th International Conference on Parallel Processing, 2017

2016
Asynchronous Parallel Greedy Coordinate Descent.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2015
Scaling Support Vector Machines on modern HPC platforms.
J. Parallel Distributed Comput., 2015

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil.
Int. J. High Perform. Comput. Appl., 2014

MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Designing a Heuristic Cross-Architecture Combination for Breadth-First Search.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Scaling and analyzing the stencil performance on multi-core and many-core architectures.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2013
Accelerating the 3D Elastic Wave Forward Modeling on GPU and MIC.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013


  Loading...