Cheng Li

Orcid: 0000-0001-7064-6120

Affiliations:
  • University of Science and Technology of China (USTC), China
  • Max Planck Institute for Software Systems, Kaiserslautern / Saarbrücken, Germany (former)


According to our database1, Cheng Li authored at least 54 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Fastmove: A Comprehensive Study of On-Chip DMA and its Demonstration for Accelerating Data Movement in NVM-based Storage Systems.
ACM Trans. Storage, August, 2024

nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Noctua: Towards Automated and Practical Fine-grained Consistency Analysis.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
A Comprehensive Study on Post-Training Quantization for Large Language Models.
CoRR, 2023

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases.
CoRR, 2023

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction.
CoRR, 2023

SPFresh: Incremental In-Place Update for Billion-Scale Vector Search.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

gSampler: General and Efficient GPU-based Graph Sampling for Graph Learning.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

MUSE: A Programmable Metadata Load Estimation Interface for Ceph File System.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases.
Proceedings of the International Conference on Machine Learning, 2023

DySR: Adaptive Super-Resolution via Algorithm and System Co-design.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Revitalizing the Forgotten On-Chip DMA to Expedite Data Movement in NVM-based Storage Systems.
Proceedings of the 21st USENIX Conference on File and Storage Technologies, 2023

CFS: Scaling Metadata Service for Distributed File System via Pruned Scope of Critical Sections.
Proceedings of the Eighteenth European Conference on Computer Systems, 2023

2022
vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training.
IEEE Trans. Parallel Distributed Syst., 2022

SelectiveEC: Towards Balanced Recovery Load on Erasure-Coded Storage Systems.
IEEE Trans. Parallel Distributed Syst., 2022

A Data Layout and Fast Failure Recovery Scheme for Distributed Storage Systems With Mixed Erasure Codes.
IEEE Trans. Computers, 2022

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers.
CoRR, 2022

BiFeat: Supercharge GNN Training via Graph Feature Quantization.
CoRR, 2022

DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming Distance.
Proceedings of the 51st International Conference on Parallel Processing, 2022

2021
Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs.
IEEE Trans. Parallel Distributed Syst., 2021

Leveraging NVMe SSDs for Building a Fast, Cost-effective, LSM-tree-based KV Store.
ACM Trans. Storage, 2021

MTFC: A Multi-GPU Training Framework for Cube-CNN-Based Hyperspectral Image Classification.
IEEE Trans. Emerg. Top. Comput., 2021

AutoGR: Automated Geo-Replication with Fast System Performance and Preserved Application Semantics.
Proc. VLDB Endow., 2021

ECR: Eviction-cost-aware cache management policy for page-level flash-based SSDs.
Concurr. Comput. Pract. Exp., 2021

Gradient Compression Supercharged High-Performance Data Parallel DNN Training.
Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021

Lunule: an agile and judicious metadata load balancer for CephFS.
Proceedings of the International Conference for High Performance Computing, 2021

SpanDB: A Fast, Cost-Effective LSM-tree Based KV Store on Hybrid Storage.
Proceedings of the 19th USENIX Conference on File and Storage Technologies, 2021

Achieving low tail-latency and high scalability for serializable transactions in edge computing.
Proceedings of the EuroSys '21: Sixteenth European Conference on Computer Systems, 2021

Lessons learned from migrating complex stateful applications onto serverless platforms.
Proceedings of the APSys '21: 12th ACM SIGOPS Asia-Pacific Workshop on Systems, 2021

2020
Not All Explorations Are Equal: Harnessing Heterogeneous Profiling Cost for Efficient MLaaS Training.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

PDL: A Data Layout towards Fast Failure Recovery for Erasure-coded Distributed Storage Systems.
Proceedings of the 39th IEEE Conference on Computer Communications, 2020

PaGraph: Scaling GNN training on large graphs via computation-aware caching.
Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

2019
BiloKey : A Scalable Bi-Index Locality-Aware In-Memory Key-Value Store.
IEEE Trans. Parallel Distributed Syst., 2019

Explicit Data Correlations-Directed Metadata Prefetching Method in Distributed File Systems.
IEEE Trans. Parallel Distributed Syst., 2019

ElasticBF: Elastic Bloom Filter with Hotness Awareness for Boosting Read Performance in Large Key-Value Stores.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019

HCFTL: A Locality-Aware Page-Level Flash Translation Layer.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Bayesian Optimisation for Objective Functions with Varying Smoothness.
Proceedings of the AI 2019: Advances in Artificial Intelligence, 2019

2018
Fine-grained consistency for geo-replicated systems.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

LCR: Load-Aware Cache Replacement Algorithm for Flash-Based SSDs.
Proceedings of the 2018 IEEE International Conference on Networking, 2018

A Flexible Method for Time-of-Flight Camera Calibration Using Random Forest.
Proceedings of the Smart Multimedia - First International Conference, 2018

ElasticBF: Fine-grained and Elastic Bloom Filter Towards Efficient Read for LSM-tree-based KV Stores.
Proceedings of the 10th USENIX Workshop on Hot Topics in Storage and File Systems, 2018

2016
Building fast and consistent (geo-)replicated systems: from principles to practice.
PhD thesis, 2016

Geo-Replication: Fast If Possible, Consistent If Necessary.
IEEE Data Eng. Bull., 2016

2015
Visigoth fault tolerance.
Proceedings of the Tenth European Conference on Computer Systems, 2015

Minimizing coordination in replicated systems.
Proceedings of the First Workshop on Principles and Practice of Consistency for Distributed Data, 2015

2014
Automating the Choice of Consistency Levels in Replicated Systems.
Proceedings of the 2014 USENIX Annual Technical Conference, 2014

2012
Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary.
Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, 2012

2011
Finding complex concurrency bugs in large multi-threaded applications.
Proceedings of the European Conference on Computer Systems, 2011

2010
A study of the internal and external effects of concurrency bugs.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010


  Loading...