Wei Lin

Orcid: 0000-0002-3003-0150

Affiliations:
  • Alibaba Group, China
  • Microsoft, Redmond, WA, USA (former)


According to our database1, Wei Lin authored at least 80 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Boosting the Convergence of Reinforcement Learning-Based Auto-Pruning Using Historical Data.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., February, 2024

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis.
CoRR, 2024

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache.
CoRR, 2024

Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach.
Proc. ACM Manag. Data, September, 2023

Expediting Distributed DNN Training With Device Topology-Aware Graph Deployment.
IEEE Trans. Parallel Distributed Syst., April, 2023

Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity.
Proc. VLDB Endow., 2023

GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning.
Proc. ACM Manag. Data, 2023

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity.
CoRR, 2023

Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches.
CoRR, 2023

Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform.
CoRR, 2023

TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation.
CoRR, 2023

EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs.
Proceedings of the International Conference for High Performance Computing, 2023

RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding Columns.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

uGrapher: High-Performance Graph Operator Computation via Unified Abstraction for Graph Neural Networks.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Optimizing DNN Compilation for Distributed Training With Joint OP and Tensor Fusion.
IEEE Trans. Parallel Distributed Syst., 2022

An Efficient Hardware Design for Accelerating Sparse CNNs With NAS-Based Models.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

EasyScale: Accuracy-consistent Elastic Training for Deep Learning.
CoRR, 2022

Whale: Efficient Giant Model Training over Heterogeneous GPUs.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022

MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022

Optimizing Federated Unsupervised Person Re-identification via Camera-aware Clustering.
Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022

Efficient Pipeline Planning for Expedited Distributed DNN Training.
Proceedings of the IEEE INFOCOM 2022, 2022

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing.
Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Accelerating large-scale distributed neural network training with SPMD parallelism.
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters.
IEEE Trans. Parallel Distributed Syst., 2021

Fangorn: Adaptive Execution Framework for Heterogeneous Workloads on Shared Clusters.
Proc. VLDB Endow., 2021

M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining.
CoRR, 2021

Boosting the Convergence of Reinforcement Learning-based Auto-pruning Using Historical Data.
CoRR, 2021

Exploring Sparse Expert Models and Beyond.
CoRR, 2021

M6: A Chinese Multimodal Pretrainer.
CoRR, 2021

DAPPLE: a pipelined data parallel approach for training large models.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

FIVES: Feature Interaction Via Edge Search for Large-Scale Tabular Data.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

MeLL: Large-scale Extensible User Intent Classification for Dialogue Systems with Meta Lifelong Learning.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

DISC: A Dynamic Shape Compiler for Machine Learning Workloads.
Proceedings of the EuroMLSys@EuroSys 2021, 2021

EasyTransfer: A Simple and Scalable Deep Transfer Learning Platform for NLP Applications.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020
Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training.
CoRR, 2020

EasyTransfer - A Simple and Scalable Deep Transfer Learning Platform for NLP Applications.
CoRR, 2020

Whale: A Unified Distributed Training Framework.
CoRR, 2020

INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile Devices.
CoRR, 2020

FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads.
CoRR, 2020

Interactive Feature Generation via Learning Adjacency Tensor of Feature Graph.
CoRR, 2020

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads.
CoRR, 2020

SwapText: Image Based Texts Transfer in Scenes.
CoRR, 2020

AntMan: Dynamic Scaling on GPU Clusters for Deep Learning.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

One-shot Text Field labeling using Attention and Belief Propagation for Structure Information Extraction.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Fast Training of Deep Learning Models over Multiple GPUs.
Proceedings of the Middleware '20: 21st International Middleware Conference, 2020

AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

A History-Based Auto-Tuning Framework for Fast and High-Performance DNN Design on GPU.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

SwapText: Image Based Texts Transfer in Scenes.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

CEFS: compute-efficient flow scheduling for iterative synchronous applications.
Proceedings of the CoNEXT '20: The 16th International Conference on emerging Networking EXperiments and Technologies, 2020

Optimizing distributed training deployment in heterogeneous GPU clusters.
Proceedings of the CoNEXT '20: The 16th International Conference on emerging Networking EXperiments and Technologies, 2020

2019
AliGraph: A Comprehensive Graph Neural Network Platform.
Proc. VLDB Endow., 2019

FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads.
CoRR, 2019

Characterizing Deep Learning Training Workloads on Alibaba-PAI.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Scene Text Recognition with Auto-Aligned Feature Generator.
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

Ouroboros: An Inference Engine for Deep Learning Based TTS on Embedded Devices.
Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

Speedy: An Accelerator for Sparse Convolutional Neural Networks on FPGAs.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

PAI-FCNN: FPGA Based CNN Inference System.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

PAI-FCNN: FPGA Based Inference System for Complex CNN Models.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

2018
Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks.
CoRR, 2018

FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs.
CoRR, 2018

Transfer Learning for Context-Aware Question Matching in Information-seeking Conversations in E-commerce.
CoRR, 2018

IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection.
CoRR, 2018

IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Efficient Deep Learning Inference Based on Model Compression.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Transfer Learning for Context-Aware Question Matching in Information-seeking Conversations in E-commerce.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2016
StreamScope: Continuous Reliable Distributed Processing of Big Data Streams.
Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation, 2016

2015
Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE.
IEEE Trans. Parallel Distributed Syst., 2015

2014
Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing.
Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, 2014

Cybertron: pushing the limit on I/O reduction in data-parallel programs.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs.
Proceedings of the 36th International Conference on Software Engineering, 2014

2013
A characteristic study on failures of production distributed data-parallel programs.
Proceedings of the 35th International Conference on Software Engineering, 2013

2012
Advanced partitioning techniques for massively distributed computation.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE.
Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, 2012

Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions.
Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, 2012

2010
Comet: batched stream processing for data intensive distributed computing.
Proceedings of the 1st ACM Symposium on Cloud Computing, 2010

2009
Wave Computing in the Cloud.
Proceedings of HotOS'09: 12th Workshop on Hot Topics in Operating Systems, 2009


  Loading...