Li Zhang

Affiliations:
  • Amazon, Seattle, WA, USA
  • IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
  • Columbia University, New York, NY, USA (PhD)


According to our database1, Li Zhang authored at least 109 papers between 1998 and 2022.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2022
Performance prediction of deep learning applications training in GPU as a service systems.
Clust. Comput., 2022

2020
Providing Performance Guarantees for Cloud-Deployed Applications.
IEEE Trans. Cloud Comput., 2020

2019
Optimizing on-demand GPUs in the Cloud for Deep Learning Applications Training.
Proceedings of the 2019 4th International Conference on Computing, 2019

Performance Prediction of GPU-based Deep Learning Applications.
Proceedings of the 9th International Conference on Cloud Computing and Services Science, 2019

2018
Model-driven optimal resource scaling in cloud.
Softw. Syst. Model., 2018

2017
IBM Deep Learning Service.
IBM J. Res. Dev., 2017

IBM Deep Learning Service.
CoRR, 2017

SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics.
Clust. Comput., 2017

Nexus: Bringing Efficient and Scalable Training to Deep Learning Frameworks.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

Lightweight Replication Through Remote Backup Memory Sharing for In-memory Key-Value Stores.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

GaDei: On Scale-Up Training as a Service for Deep Learning.
Proceedings of the 2017 IEEE International Conference on Data Mining, 2017

iRDMA: Efficient Use of RDMA in Distributed Deep Learning Systems.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

2016
MapTask Scheduling in MapReduce With Data Locality: Throughput and Heavy-Traffic Optimality.
IEEE/ACM Trans. Netw., 2016

GaDei: On Scale-up Training As A Service For Deep Learning.
CoRR, 2016

MEMTUNE: Dynamic Memory Management for In-Memory Data Analytic Platforms.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Autoscaling for Hadoop Clusters.
Proceedings of the 2016 IEEE International Conference on Cloud Engineering, 2016

zExpander: a key-value cache with both high performance and fewer misses.
Proceedings of the Eleventh European Conference on Computer Systems, 2016

NVMcached: An NVM-based Key-Value Cache.
Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems, 2016

Stage Aware Performance Modeling of DAG Based in Memory Analytic Platforms.
Proceedings of the 9th IEEE International Conference on Cloud Computing, 2016

2015
Skewless Network Clock Synchronization Without Discontinuity: Convergence and Performance.
IEEE/ACM Trans. Netw., 2015

Miss behavior for caching with lease.
SIGMETRICS Perform. Evaluation Rev., 2015

Multi-resource Fair Sharing for Multiclass Workflows.
SIGMETRICS Perform. Evaluation Rev., 2015

HydraDB: a resilient RDMA-driven key-value middleware for in-memory cluster computing.
Proceedings of the International Conference for High Performance Computing, 2015

Information sharing in distributed stochastic bandits.
Proceedings of the 2015 IEEE Conference on Computer Communications, 2015

Model-Driven Autoscaling for Hadoop Clusters.
Proceedings of the 2015 IEEE International Conference on Autonomic Computing, 2015

SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

2014
Estimating life-time distribution by observing population continuously.
Perform. Evaluation, 2014

Non-work-conserving effects in MapReduce: diffusion limit and criticality.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

Modeling the Impact of Workload on Cloud Resource Scaling.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Adaptive, Model-driven Autoscaling for Cloud Applications.
Proceedings of the 11th International Conference on Autonomic Computing, 2014

MRONLINE: MapReduce online performance tuning.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

DynMR: dynamic MapReduce with ReduceTask interleaving and MapTask backfilling.
Proceedings of the Ninth Eurosys Conference 2014, 2014

C-Hint: An Effective and Reliable Cache Management for RDMA-Accelerated Key-Value Stores.
Proceedings of the ACM Symposium on Cloud Computing, 2014

2013
A Hierarchical Approach for the Resource Management of Very Large Cloud Platforms.
IEEE Trans. Dependable Secur. Comput., 2013

A throughput optimal algorithm for map task scheduling in mapreduce with data locality.
SIGMETRICS Perform. Evaluation Rev., 2013

Joint optimization of overlapping phases in MapReduce.
SIGMETRICS Perform. Evaluation Rev., 2013

Map task scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality.
Proceedings of the IEEE INFOCOM 2013, Turin, Italy, April 14-19, 2013, 2013

Coupling task progress for MapReduce resource-aware scheduling.
Proceedings of the IEEE INFOCOM 2013, Turin, Italy, April 14-19, 2013, 2013

Improving ReduceTask data locality for sequential MapReduce jobs.
Proceedings of the IEEE INFOCOM 2013, Turin, Italy, April 14-19, 2013, 2013

Skewless network clock synchronization.
Proceedings of the 2013 21st IEEE International Conference on Network Protocols, 2013

Preemptive ReduceTask Scheduling for Fair and Fast Job Completion.
Proceedings of the 10th International Conference on Autonomic Computing, 2013

K-Scope: Online Performance Tracking for Dynamic Cloud Applications.
Proceedings of the 10th International Conference on Autonomic Computing, 2013

Improving Multi-job MapReduce Scheduling in an Opportunistic Environment.
Proceedings of the 2013 IEEE Sixth International Conference on Cloud Computing, Santa Clara, CA, USA, June 28, 2013

2012
Energy-Aware Autonomic Resource Allocation in Multitier Virtualized Environments.
IEEE Trans. Serv. Comput., 2012

Experiences in building and scaling an enterprise application on multicore systems.
Concurr. Comput. Pract. Exp., 2012

Heavy-traffic analysis of cloud provisioning.
Proceedings of the 24th International Teletraffic Congress, 2012

Delay tails in MapReduce scheduling.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012

Performance Modeling and Characterization of Large Last Level Caches.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

Performance analysis of Coupling Scheduler for MapReduce/Hadoop.
Proceedings of the IEEE INFOCOM 2012, Orlando, FL, USA, March 25-30, 2012, 2012

Coupling scheduler for MapReduce/Hadoop.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

Delay asymptotics for heavy-tailed MapReduce jobs.
Proceedings of the 50th Annual Allerton Conference on Communication, 2012

2011
On estimation problems for the G/G/∞ Queue.
SIGMETRICS Perform. Evaluation Rev., 2011

Identification and approximations for systems with multi-stage workflows.
Proceedings of the Winter Simulation Conference 2011, 2011

A Tool for Scalable Profiling and Tracing of Java and Native Code Interactions.
Proceedings of the Eighth International Conference on Quantitative Evaluation of Systems, 2011

A Hybrid Approach for Large Cache Performance Studies.
Proceedings of the Eighth International Conference on Quantitative Evaluation of Systems, 2011

Consolidating virtual machines with dynamic bandwidth demand in data centers.
Proceedings of the INFOCOM 2011. 30th IEEE International Conference on Computer Communications, 2011

Exploiting Resource Usage Patterns for Better Utilization Prediction.
Proceedings of the 31st IEEE International Conference on Distributed Computing Systems Workshops (ICDCS 2011 Workshops), 2011

2010
Performance of large low-associativity caches.
SIGMETRICS Perform. Evaluation Rev., 2010

Resiliency of distributed clock synchronization networks.
SIGMETRICS Perform. Evaluation Rev., 2010

Linear-speed interior-path algorithms for distributed control of information networks.
Perform. Evaluation, 2010

Black-box performance models for virtualized web service applications.
Proceedings of the first joint WOSP/SIPEW International Conference on Performance Engineering, 2010

Program behavior characterization in large memory systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement.
Proceedings of the INFOCOM 2010. 29th IEEE International Conference on Computer Communications, 2010

Efficient resource provisioning in compute clouds via VM multiplexing.
Proceedings of the 7th International Conference on Autonomic Computing, 2010

Autonomic Management of Cloud Service Centers with Availability Guarantees.
Proceedings of the IEEE International Conference on Cloud Computing, 2010

2009
A decentralized control mechanism for stream processing networks.
Ann. Oper. Res., 2009

Real-time performance modeling for adaptive software systems.
Proceedings of the 4th International Conference on Performance Evaluation Methodologies and Tools, 2009

Enhanced inferencing: estimation of a workload dependent performance model.
Proceedings of the 4th International Conference on Performance Evaluation Methodologies and Tools, 2009

Run-time resource management in SOA virtualized environments.
Proceedings of the 1st international workshop on Quality of service-oriented software systems, 2009

Real-time performance modeling for adaptive software systems with multi-class workload.
Proceedings of the 17th Annual Meeting of the IEEE/ACM International Symposium on Modelling, 2009

Connection and performance model driven optimization of pageview response time.
Proceedings of the 17th Annual Meeting of the IEEE/ACM International Symposium on Modelling, 2009

2008
Distributed multi-layered workload synthesis for testing stream processing systems.
Proceedings of the 2008 Winter Simulation Conference, Global Gateway to Discovery, 2008

Model Identification for Energy-Aware Management of Web Service Systems.
Proceedings of the Service-Oriented Computing, 2008

2007
Load shedding and distributed resource control of stream processing networks.
Perform. Evaluation, 2007

SLA based resource allocation policies in autonomic environments.
J. Parallel Distributed Comput., 2007

Performance Evaluation of a Commercial Application, Trade, in Scale-out Environments.
Proceedings of the 15th International Symposium on Modeling, 2007

Performance Studies of a WebSphere Application, Trade, in Scale-out and Scale-up Environments.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Almost Peer-to-Peer Clock Synchronization.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Scalability of the Nutch search engine.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

2006
Distributed Resource Allocation in Stream Processing Systems.
Proceedings of the Distributed Computing, 20th International Symposium, 2006

A multicommodity flow model for distributed stream processing.
Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 2006

Distributed Resource Allocation for Stream Data Processing.
Proceedings of the High Performance Computing and Communications, 2006

2005
Research on an iterative algorithm of LS channel estimation in MIMO OFDM systems.
IEEE Trans. Broadcast., 2005

Web traffic modeling at finer time scales and performance implications.
Perform. Evaluation, 2005

Optimal capacity allocation for Web systems with end-to-end delay guarantees.
Perform. Evaluation, 2005

SLA Based Profit Optimization in Multi-tier Systems.
Proceedings of the Fourth IEEE International Symposium on Network Computing and Applications (NCA 2005), 2005

2004
Efficiently serving dynamic data at highly accessed web sites.
IEEE/ACM Trans. Netw., 2004

Cost minimization of multi-tiered e-business infrastructure with end-to-end delay guarantees.
SIGMETRICS Perform. Evaluation Rev., 2004

SLA based profit optimization in web systems.
Proceedings of the 13th international conference on World Wide Web, 2004

A smart hill-climbing algorithm for application server configuration.
Proceedings of the 13th international conference on World Wide Web, 2004

SLA based profit optimization in autonomic computing systems.
Proceedings of the Service-Oriented Computing, 2004

Overlay Multicast Trees of Minimal Delay.
Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS 2004), 2004

2003
New Algorithms for Content-Based Publication-Subscription Systems.
Proceedings of the 23rd International Conference on Distributed Computing Systems (ICDCS 2003), 2003

A Comprehensive Toolset for Workload Characterization, Performance Modeling, and Online Control.
Proceedings of the Computer Performance Evaluations, 2003

2002
Traffic modeling and performance analysis of commercial web sites.
SIGMETRICS Perform. Evaluation Rev., 2002

Optimal scheduling in queuing network models of high-volume commercial web sites.
Perform. Evaluation, 2002

Workload Service Requirements Analysis: A Queueing Network Optimization Approach.
Proceedings of the 10th International Workshop on Modeling, 2002

Clock Synchronization Algorithms for Network Measurements.
Proceedings of the Proceedings IEEE INFOCOM 2002, 2002

Clustering Algorithms for Content-Based Publication-Subscription Systems.
Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS'02), 2002

Analysis of measurement data from sporting event Web sites.
Proceedings of the Global Telecommunications Conference, 2002

Analysis of Caching Mechanisms from Sporting Event Web Sites.
Proceedings of the Advances in Computing Science, 2002

2001
Optimal scheduling in queueing network models of high-volume commercial web sites.
SIGMETRICS Perform. Evaluation Rev., 2001

Analysis of queues under correlated arrivals with applications to web server performance.
SIGMETRICS Perform. Evaluation Rev., 2001

Threshold-based priority policies for parallel-server systems with affinity scheduling.
Proceedings of the American Control Conference, 2001

1999
Analysis and Characterization of Large-Scale Web Server Access Patterns and Performance.
World Wide Web, 1999

The impact of job arrival patterns on parallel scheduling.
SIGMETRICS Perform. Evaluation Rev., 1999

Web traffic modeling and Web server performance analysis.
SIGMETRICS Perform. Evaluation Rev., 1999

Analysis of Job Arrival Patterns and Parallel Scheduling Performance.
Perform. Evaluation, 1999

1998
A General Methodology for Characterizing Access Patterns and Analyzing Web Server Performance.
Proceedings of the MASCOTS 1998, 1998


  Loading...