Yuxiong He

Proceedings of the SC22: International Conference for High Performance Computing, 2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.

[BibT_eX]

[DOI]

Zhewei Yao

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models.

[BibT_eX]

[DOI]

Conglong Li

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.

[BibT_eX]

[DOI]

Ammar Ahmad Awan

Jeff Rasley

Proceedings of the International Conference on Machine Learning, 2022

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Task Offloading Based on GRU Model in IoT.

[BibT_eX]

[DOI]

Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering, 2022

Adversarial Data Augmentation for Task-Specific Knowledge Distillation of Pre-trained Transformers.

[BibT_eX]

[DOI]

Uma-Naresh Niranjan

Andrés Felipe Cruz-Salinas

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Scalable and Efficient MoE Training for Multitask Multilingual Models.

[BibT_eX]

[DOI]

Young Jin Kim

Ammar Ahmad Awan

Alexandre Muzio

CoRR, 2021

Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training.

[BibT_eX]

[DOI]

Conglong Li

CoRR, 2021

ZeRO-Offload: Democratizing Billion-Scale Model Training.

[BibT_eX]

[DOI]

Jie Ren

Samyam Rajbhandari

Proceedings of the 2021 USENIX Annual Technical Conference, 2021

ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Fast LSTM by dynamic decomposition on cloud and distributed systems.

[BibT_eX]

[DOI]

Knowl. Inf. Syst., 2020

Local trend discovery on real-time microblogs with uncertain locations in tight memory environments.

[BibT_eX]

[DOI]

GeoInformatica, 2020

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm.

[BibT_eX]

[DOI]

CoRR, 2020

Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination.

[BibT_eX]

[DOI]

Proceedings of the 2020 International Conference on Management of Data, 2020

ZeRO: memory optimizations toward training trillion parameter models.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters.

[BibT_eX]

[DOI]

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019

Communication-Aware Scheduling of Precedence-Constrained Tasks.

[BibT_eX]

[DOI]

SIGMETRICS Perform. Evaluation Rev., 2019

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory.

[BibT_eX]

[DOI]

CoRR, 2019

ZeRO: Memory Optimization Towards Training A Trillion Parameter Models.

[BibT_eX]

[DOI]

CoRR, 2019

AntMan: Sparse Low-Rank Compression to Accelerate RNN inference.

[BibT_eX]

[DOI]

Samyam Rajbhandari

Harsh Shrivastava

CoRR, 2019

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft.

[BibT_eX]

[DOI]

Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019

Deep Learning Inference Service at Microsoft.

[BibT_eX]

[DOI]

Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019

Fast LSTM Inference by Dynamic Decomposition on Cloud Systems.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

GRNN: Low-Latency and Scalable RNN Inference on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, 2019

GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

2018

Efficient Deep Neural Network Serving: Fast and Furious.

[BibT_eX]

[DOI]

IEEE Trans. Netw. Serv. Manag., 2018

Stochastic Modeling and Optimization of Stragglers.

[BibT_eX]

[DOI]

Farshid Farhat

Diman Zad Tootaghaj

Anand Sivasubramaniam

Mahmut T. Kandemir

Chita R. Das

IEEE Trans. Cloud Comput., 2018

Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory.

[BibT_eX]

[DOI]

CoRR, 2018

Better Caching in Search Advertising Systems with Rapid Refresh Predictions.

[BibT_eX]

[DOI]

Proceedings of the 2018 World Wide Web Conference on World Wide Web, 2018

DeepCPU: Serving RNN-based Deep Learning Models 10x Faster.

[BibT_eX]

[DOI]

Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Learning Intrinsic Sparse Structures within Long Short-Term Memory.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Obtaining and Managing Answer Quality for Online Data-Intensive Services.

[BibT_eX]

[DOI]

ACM Trans. Model. Perform. Evaluation Comput. Syst., 2017

Learning Intrinsic Sparse Structures within Long Short-term Memory.

[BibT_eX]

[DOI]

CoRR, 2017

Optimal Reissue Policies for Reducing Tail Latency.

[BibT_eX]

[DOI]

Tim Kaler

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

BitFunnel: Revisiting Signatures for Search.

[BibT_eX]

[DOI]

Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

HyperDrive: exploring hyperparameters with POP scheduling.

[BibT_eX]

[DOI]

Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas, NV, USA, December 11, 2017

Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency.

[BibT_eX]

[DOI]

Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas, NV, USA, December 11, 2017

Exploiting heterogeneity for tail latency and energy efficiency.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

When Good Enough Is Better: Energy-Aware Scheduling for Multicore Servers.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Workload analysis and caching strategies for search advertising systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017

Optimizing CNNs on Multicores for Scalability, Performance and Goodput.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016

Prediction and Predictability for Search Query Acceleration.

[BibT_eX]

[DOI]

ACM Trans. Web, 2016

Venus: Scalable Real-Time Spatial Queries on Microblogs with Adaptive Load Shedding.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2016

SERF: efficient scheduling for fast deep neural network serving via judicious parallelism.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

Work stealing for interactive services to meet target latency.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

GeoTrend: spatial trending queries on real-time microblogs.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2016, Burlingame, California, USA, October 31, 2016

TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services.

[BibT_eX]

[DOI]

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015

Online Resource Management for Carbon-Neutral Cloud Computing.

[BibT_eX]

[DOI]

Kishwar Ahmed

Athanasios V. Vasilakos

Proceedings of the Handbook on Data Centers, 2015

Processing and Optimizing Main Memory Spatial-Keyword Queries.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2015

Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search.

[BibT_eX]

[DOI]

Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 2015

Optimal Aggregation Policy for Reducing Tail Latency of Web Search.

[BibT_eX]

[DOI]

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015

BATS: Budget-Constrained Autoscaling for Cloud Performance Optimization.

[BibT_eX]

[DOI]

A. Hasan Mahmud

Proceedings of the 23rd IEEE International Symposium on Modeling, 2015

Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems.

[BibT_eX]

[DOI]

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015

Measuring and Managing Answer Quality for Online Data-Intensive Services.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Autonomic Computing, 2015

Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services.

[BibT_eX]

[DOI]

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014

Energy-efficient multiprocessor scheduling for flow time and makespan.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2014

Hybrid query execution engine for large attributed graphs.

[BibT_eX]

[DOI]

Sherif Sakr

Inf. Syst., 2014

A Theoretical Foundation for Scheduling and Designing Heterogeneous Processors for Interactive Applications.

[BibT_eX]

[DOI]

Kathryn S. McKinley

Proceedings of the Distributed Computing - 28th International Symposium, 2014

Predictive parallelization: taming tail latencies in web search.

[BibT_eX]

[DOI]

Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Mercury: A memory-constrained spatio-temporal real-time search on microblogs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

Mars: Real-time spatio-temporal queries on microblogs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

2013

Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2013

Solving Graph Isomorphism Using Parameterized Matching.

[BibT_eX]

[DOI]

Proceedings of the String Processing and Information Retrieval, 2013

COCA: online distributed resource management for cost minimization and carbon neutrality in data centers.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Power-effiicent resource allocation in MapReduce clusters.

[BibT_eX]

[DOI]

Kaiqi Xiong

Proceedings of the 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), 2013

Performance Inconsistency in Large Scale Data Processing Clusters.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Autonomic Computing, 2013

Exploiting Processor Heterogeneity in Interactive Services.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Autonomic Computing, 2013

Adaptive parallelism for web search.

[BibT_eX]

[DOI]

Proceedings of the Eighth Eurosys Conference 2013, 2013

Topic 3: Scheduling and Load Balancing - (Introduction).

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

QACO: exploiting partial execution in web servers.

[BibT_eX]

[DOI]

Proceedings of the ACM Cloud and Autonomic Computing Conference, 2013

2012

Horton: Online Query Execution Engine for Large Distributed Graphs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), 2012

Provably-Efficient Job Scheduling for Energy and Fairness in Geographically Distributed Data Centers.

[BibT_eX]

[DOI]

Fei Xu

Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012

Budget-based control for interactive services with adaptive execution.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Autonomic Computing, 2012

Zeta: scheduling interactive services with partial execution.

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

G-SPARQL: a hybrid engine for querying large attributed graphs.

[BibT_eX]

[DOI]

Sherif Sakr

Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011

Speed Scaling for Energy and Performance with Instantaneous Parallelism.

[BibT_eX]

[DOI]

Proceedings of the Theory and Practice of Algorithms in (Computer) Systems, 2011

Scheduling Functionally Heterogeneous Systems with Utilization Balancing.

[BibT_eX]

[DOI]

Jie Liu

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Tians Scheduling: Using Partial Processing in Best-Effort Applications.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Distributed Computing Systems, 2011

Scheduling for data center interactive services.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual Allerton Conference on Communication, 2011

Position Paper: Embracing Heterogeneity - Improving Energy Efficiency for Interactive Services on Heterogeneous Data Center Hardware.

[BibT_eX]

[DOI]

Proceedings of the AI for Data Center Management and Cloud Computing, 2011

2010

Improved results for scheduling batched parallel jobs by using a generalized analysis framework.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2010

Energy-Efficient Multiprocessor Scheduling for Flow Time and Makespan

[BibT_eX]

[DOI]

CoRR, 2010

The Cilkview scalability analyzer.

[BibT_eX]

[DOI]

William M. Leiserson

Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

2008

Provably Efficient Online Nonclairvoyant Adaptive Scheduling.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2008

Adaptive work-stealing with parallelism feedback.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2008

2007

Adaptive work stealing with parallelism feedback.

[BibT_eX]

[DOI]

Kunal Agrawal

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Provably Efficient Online Non-clairvoyant Adaptive Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Adaptive Scheduling with Parallelism Feedback.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Adaptive Scheduling of Parallel Jobs on Functionally Heterogeneous Resources.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

2006

Provably Efficient Two-Level Adaptive Scheduling.

[BibT_eX]

[DOI]

Proceedings of the Job Scheduling Strategies for Parallel Processing, 2006

An Empirical Evaluation ofWork Stealing with Parallelism Feedback.

[BibT_eX]

[DOI]

Kunal Agrawal