Yuxiong He

According to our database1, Yuxiong He authored at least 87 papers between 2004 and 2021.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2021
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed.
CoRR, 2021

ZeRO-Offload: Democratizing Billion-Scale Model Training.
CoRR, 2021

2020
Fast LSTM by dynamic decomposition on cloud and distributed systems.
Knowl. Inf. Syst., 2020

Local trend discovery on real-time microblogs with uncertain locations in tight memory environments.
GeoInformatica, 2020

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm.
CoRR, 2020

Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination.
Proceedings of the 2020 International Conference on Management of Data, 2020

ZeRO: memory optimizations toward training trillion parameter models.
Proceedings of the SC '20: The International Conference for High Performance Computing, 2020

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019
Communication-Aware Scheduling of Precedence-Constrained Tasks.
SIGMETRICS Perform. Evaluation Rev., 2019

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory.
CoRR, 2019

ZeRO: Memory Optimization Towards Training A Trillion Parameter Models.
CoRR, 2019

AntMan: Sparse Low-Rank Compression to Accelerate RNN inference.
CoRR, 2019

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft.
Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019

Deep Learning Inference Service at Microsoft.
Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019

Fast LSTM Inference by Dynamic Decomposition on Cloud Systems.
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

GRNN: Low-Latency and Scalable RNN Inference on GPUs.
Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, 2019

GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

2018
Efficient Deep Neural Network Serving: Fast and Furious.
IEEE Trans. Netw. Serv. Manag., 2018

Stochastic Modeling and Optimization of Stragglers.
IEEE Trans. Cloud Comput., 2018

Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory.
CoRR, 2018

Better Caching in Search Advertising Systems with Rapid Refresh Predictions.
Proceedings of the 2018 World Wide Web Conference on World Wide Web, 2018

DeepCPU: Serving RNN-based Deep Learning Models 10x Faster.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Learning Intrinsic Sparse Structures within Long Short-Term Memory.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Obtaining and Managing Answer Quality for Online Data-Intensive Services.
ACM Trans. Model. Perform. Evaluation Comput. Syst., 2017

Learning Intrinsic Sparse Structures within Long Short-term Memory.
CoRR, 2017

Optimal Reissue Policies for Reducing Tail Latency.
Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

BitFunnel: Revisiting Signatures for Search.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

HyperDrive: exploring hyperparameters with POP scheduling.
Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas, NV, USA, December 11, 2017

Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency.
Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas, NV, USA, December 11, 2017

Exploiting heterogeneity for tail latency and energy efficiency.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

When Good Enough Is Better: Energy-Aware Scheduling for Multicore Servers.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Workload analysis and caching strategies for search advertising systems.
Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017

Optimizing CNNs on Multicores for Scalability, Performance and Goodput.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
Prediction and Predictability for Search Query Acceleration.
ACM Trans. Web, 2016

Venus: Scalable Real-Time Spatial Queries on Microblogs with Adaptive Load Shedding.
IEEE Trans. Knowl. Data Eng., 2016

SERF: efficient scheduling for fast deep neural network serving via judicious parallelism.
Proceedings of the International Conference for High Performance Computing, 2016

Work stealing for interactive services to meet target latency.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

GeoTrend: spatial trending queries on real-time microblogs.
Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2016, Burlingame, California, USA, October 31, 2016

TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015
Online Resource Management for Carbon-Neutral Cloud Computing.
Proceedings of the Handbook on Data Centers, 2015

Processing and Optimizing Main Memory Spatial-Keyword Queries.
Proc. VLDB Endow., 2015

Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search.
Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 2015

Optimal Aggregation Policy for Reducing Tail Latency of Web Search.
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015

BATS: Budget-Constrained Autoscaling for Cloud Performance Optimization.
Proceedings of the 23rd IEEE International Symposium on Modeling, 2015

Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems.
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015

Measuring and Managing Answer Quality for Online Data-Intensive Services.
Proceedings of the 2015 IEEE International Conference on Autonomic Computing, 2015

Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Energy-efficient multiprocessor scheduling for flow time and makespan.
Theor. Comput. Sci., 2014

Hybrid query execution engine for large attributed graphs.
Inf. Syst., 2014

A Theoretical Foundation for Scheduling and Designing Heterogeneous Processors for Interactive Applications.
Proceedings of the Distributed Computing - 28th International Symposium, 2014

Predictive parallelization: taming tail latencies in web search.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Mercury: A memory-constrained spatio-temporal real-time search on microblogs.
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

Mars: Real-time spatio-temporal queries on microblogs.
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

2013
Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs.
Proc. VLDB Endow., 2013

Solving Graph Isomorphism Using Parameterized Matching.
Proceedings of the String Processing and Information Retrieval, 2013

COCA: online distributed resource management for cost minimization and carbon neutrality in data centers.
Proceedings of the International Conference for High Performance Computing, 2013

Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Power-effiicent resource allocation in MapReduce clusters.
Proceedings of the 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), 2013

Performance Inconsistency in Large Scale Data Processing Clusters.
Proceedings of the 10th International Conference on Autonomic Computing, 2013

Exploiting Processor Heterogeneity in Interactive Services.
Proceedings of the 10th International Conference on Autonomic Computing, 2013

Adaptive parallelism for web search.
Proceedings of the Eighth Eurosys Conference 2013, 2013

Topic 3: Scheduling and Load Balancing - (Introduction).
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

QACO: exploiting partial execution in web servers.
Proceedings of the ACM Cloud and Autonomic Computing Conference, 2013

2012
Horton: Online Query Execution Engine for Large Distributed Graphs.
Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), 2012

Provably-Efficient Job Scheduling for Energy and Fairness in Geographically Distributed Data Centers.
Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012

Budget-based control for interactive services with adaptive execution.
Proceedings of the 9th International Conference on Autonomic Computing, 2012

Zeta: scheduling interactive services with partial execution.
Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

G-SPARQL: a hybrid engine for querying large attributed graphs.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011
Speed Scaling for Energy and Performance with Instantaneous Parallelism.
Proceedings of the Theory and Practice of Algorithms in (Computer) Systems, 2011

Scheduling Functionally Heterogeneous Systems with Utilization Balancing.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Tians Scheduling: Using Partial Processing in Best-Effort Applications.
Proceedings of the 2011 International Conference on Distributed Computing Systems, 2011

Scheduling for data center interactive services.
Proceedings of the 49th Annual Allerton Conference on Communication, 2011

Position Paper: Embracing Heterogeneity - Improving Energy Efficiency for Interactive Services on Heterogeneous Data Center Hardware.
Proceedings of the AI for Data Center Management and Cloud Computing, 2011

2010
Improved results for scheduling batched parallel jobs by using a generalized analysis framework.
J. Parallel Distributed Comput., 2010

Energy-Efficient Multiprocessor Scheduling for Flow Time and Makespan
CoRR, 2010

The Cilkview scalability analyzer.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

2008
Provably Efficient Online Nonclairvoyant Adaptive Scheduling.
IEEE Trans. Parallel Distributed Syst., 2008

Adaptive work-stealing with parallelism feedback.
ACM Trans. Comput. Syst., 2008

2007
Adaptive work stealing with parallelism feedback.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Provably Efficient Online Non-clairvoyant Adaptive Scheduling.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Adaptive Scheduling with Parallelism Feedback.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Adaptive Scheduling of Parallel Jobs on Functionally Heterogeneous Resources.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

2006
Provably Efficient Two-Level Adaptive Scheduling.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2006

An Empirical Evaluation ofWork Stealing with Parallelism Feedback.
Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS 2006), 2006

2004
Secure communications between bandwidth brokers.
ACM SIGOPS Oper. Syst. Rev., 2004


  Loading...