How to Trade Off the Quantity and Capacity of Teacher Ensemble: Learning Categorical Distribution to Stochastically Employ a Teacher for Distillation.

[BibT_eX]

[DOI]

Zixiang Ding

Guoqing Jiang

Shuai Zhang

Lin Guo

Wei Lin

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis.

[BibT_eX]

[DOI]

Dataset, November, 2023

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis.

[BibT_eX]

[DOI]

Dataset, November, 2023

BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach.

[BibT_eX]

[DOI]

Proc. ACM Manag. Data, September, 2023

Expediting Distributed DNN Training With Device Topology-Aware Graph Deployment.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., April, 2023

Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2023

GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning.

[BibT_eX]

[DOI]

Proc. ACM Manag. Data, 2023

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity.

[BibT_eX]

[DOI]

CoRR, 2023

Heterogeneous Knowledge Fusion: A Novel Approach for Personalized Recommendation via LLM.

[BibT_eX]

[DOI]

CoRR, 2023

Modeling Dual Period-Varying Preferences for Takeaway Recommendation.

[BibT_eX]

[DOI]

CoRR, 2023

Dual Intent Enhanced Graph Neural Network for Session-based New Item Recommendation.

[BibT_eX]

[DOI]

CoRR, 2023

Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches.

[BibT_eX]

[DOI]

CoRR, 2023

Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform.

[BibT_eX]

[DOI]

CoRR, 2023

TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation.

[BibT_eX]

[DOI]

CoRR, 2023

EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

GDsmith: Detecting Bugs in Cypher Graph Database Engines.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023

RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding Columns.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

uGrapher: High-Performance Graph Operator Computation via Unified Abstraction for Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

Optimizing DNN Compilation for Distributed Training With Joint OP and Tensor Fusion.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

An Efficient Hardware Design for Accelerating Sparse CNNs With NAS-Based Models.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

EasyScale: Accuracy-consistent Elastic Training for Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2022

GDsmith: Detecting Bugs in Graph Database Engines.

[BibT_eX]

[DOI]

CoRR, 2022

Whale: Efficient Giant Model Training over Heterogeneous GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2022 USENIX Annual Technical Conference, 2022

MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022

Optimizing Federated Unsupervised Person Re-identification via Camera-aware Clustering.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022

TaintSQL: Dynamically Tracking Fine-Grained Implicit Flows for SQL Statements.

[BibT_eX]

[DOI]

Proceedings of the IEEE 33rd International Symposium on Software Reliability Engineering, 2022

Efficient Pipeline Planning for Expedited Distributed DNN Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE INFOCOM 2022, 2022

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing.

[BibT_eX]

[DOI]

Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Accelerating large-scale distributed neural network training with SPMD parallelism.

[BibT_eX]

[DOI]

Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021

DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Fangorn: Adaptive Execution Framework for Heterogeneous Workloads on Shared Clusters.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2021

M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining.

[BibT_eX]

[DOI]

CoRR, 2021

Exploring Sparse Expert Models and Beyond.

[BibT_eX]

[DOI]

CoRR, 2021

M6: A Chinese Multimodal Pretrainer.

[BibT_eX]

[DOI]

CoRR, 2021

Towards a Better Tradeoff between Effectiveness and Efficiency in Pre-Ranking: A Learnable Feature Selection based Approach.

[BibT_eX]

[DOI]

Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Explicit Semantic Cross Feature Learning via Pre-trained Graph Neural Networks for CTR Prediction.

[BibT_eX]

[DOI]

Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

DAPPLE: a pipelined data parallel approach for training large models.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

FIVES: Feature Interaction Via Edge Search for Large-Scale Tabular Data.

[BibT_eX]

[DOI]

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

MeLL: Large-scale Extensible User Intent Classification for Dialogue Systems with Meta Lifelong Learning.

[BibT_eX]

[DOI]

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

DISC: A Dynamic Shape Compiler for Machine Learning Workloads.

[BibT_eX]

[DOI]

Proceedings of the EuroMLSys@EuroSys 2021, 2021

Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer.

[BibT_eX]

[DOI]

Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

Binary Code based Hash Embedding for Web-scale Applications.

[BibT_eX]

[DOI]

Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

EasyTransfer: A Simple and Scalable Deep Transfer Learning Platform for NLP Applications.

[BibT_eX]

[DOI]

Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020

Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2020

EasyTransfer - A Simple and Scalable Deep Transfer Learning Platform for NLP Applications.

[BibT_eX]

[DOI]

CoRR, 2020

Whale: A Unified Distributed Training Framework.

[BibT_eX]

[DOI]

CoRR, 2020

INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile Devices.

[BibT_eX]

[DOI]

CoRR, 2020

FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads.

[BibT_eX]

[DOI]

CoRR, 2020

Interactive Feature Generation via Learning Adjacency Tensor of Feature Graph.

[BibT_eX]

[DOI]

CoRR, 2020

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads.

[BibT_eX]

[DOI]

CoRR, 2020

SwapText: Image Based Texts Transfer in Scenes.

[BibT_eX]

[DOI]

CoRR, 2020

AntMan: Dynamic Scaling on GPU Clusters for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

One-shot Text Field labeling using Attention and Belief Propagation for Structure Information Extraction.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Fast Training of Deep Learning Models over Multiple GPUs.

[BibT_eX]

[DOI]

Proceedings of the Middleware '20: 21st International Middleware Conference, 2020

AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

A History-Based Auto-Tuning Framework for Fast and High-Performance DNN Design on GPU.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

SwapText: Image Based Texts Transfer in Scenes.

[BibT_eX]

[DOI]

Qiangpeng Yang

Jun Huang

Wei Lin

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

CEFS: compute-efficient flow scheduling for iterative synchronous applications.

[BibT_eX]

[DOI]

Proceedings of the CoNEXT '20: The 16th International Conference on emerging Networking EXperiments and Technologies, 2020

Optimizing distributed training deployment in heterogeneous GPU clusters.

[BibT_eX]

[DOI]

Proceedings of the CoNEXT '20: The 16th International Conference on emerging Networking EXperiments and Technologies, 2020

2019

AliGraph: A Comprehensive Graph Neural Network Platform.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2019

FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads.

[BibT_eX]

[DOI]

Guoping Long

Jun Yang

Wei Lin

CoRR, 2019

Characterizing Deep Learning Training Workloads on Alibaba-PAI.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Scene Text Recognition with Auto-Aligned Feature Generator.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

Ouroboros: An Inference Engine for Deep Learning Based TTS on Embedded Devices.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

Speedy: An Accelerator for Sparse Convolutional Neural Networks on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

PAI-FCNN: FPGA Based CNN Inference System.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

PAI-FCNN: FPGA Based Inference System for Complex CNN Models.

[BibT_eX]

[DOI]

Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

2018

Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2018

FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs.

[BibT_eX]

[DOI]

CoRR, 2018

Transfer Learning for Context-Aware Question Matching in Information-seeking Conversations in E-commerce.

[BibT_eX]

[DOI]

CoRR, 2018

IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection.

[BibT_eX]

[DOI]

CoRR, 2018

IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Efficient Deep Learning Inference Based on Model Compression.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Transfer Learning for Context-Aware Question Matching in Information-seeking Conversations in E-commerce.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2016

StreamScope: Continuous Reliable Distributed Processing of Big Data Streams.

[BibT_eX]

[DOI]

Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation, 2016

2015

Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2015

2014

Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing.

[BibT_eX]

[DOI]

Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, 2014

Cybertron: pushing the limit on I/O reduction in data-parallel programs.

[BibT_eX]

[DOI]

Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Software Engineering, 2014

2013

A characteristic study on failures of production distributed data-parallel programs.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Software Engineering, 2013

2012

Advanced partitioning techniques for massively distributed computation.

[BibT_eX]

[DOI]

Jingren Zhou

Nicolas Bruno

Wei Lin

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE.

[BibT_eX]

[DOI]

Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, 2012

Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions.

[BibT_eX]

[DOI]

Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, 2012

2010

Comet: batched stream processing for data intensive distributed computing.

[BibT_eX]

[DOI]

Proceedings of the 1st ACM Symposium on Cloud Computing, 2010

2009

Wave Computing in the Cloud.

[BibT_eX]

[DOI]

Proceedings of HotOS'09: 12th Workshop on Hot Topics in Operating Systems, 2009

Wei Lin

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...