We stand with Ukraine

We stand with Ukraine

Ningxin Zheng

Orcid: 0009-0009-6449-8972

According to our database¹, Ningxin Zheng authored at least 36 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism.

[DOI]

,

,

,

,

,

,

Cesar A. Stuardo

,

,

Mohamed S. Abdelfattah

,

,

,

,

CoRR, May, 2026

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 21st European Conference on Computer Systems, 2026

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 21st European Conference on Computer Systems, 2026

2025

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution.

[DOI]

,

,

,

,

,

,

,

,

CoRR, September, 2025

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts.

[DOI]

,

,

,

,

,

Chengquan Jiang

,

,

,

,

,

,

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024

Online Streaming Video Super-Resolution With Convolutional Look-Up Table.

[DOI]

,

,

,

,

,

,

,

,

,

,

IEEE Trans. Image Process., 2024

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion.

[DOI]

,

,

,

Chengquan Jiang

,

,

,

,

,

,

,

,

CoRR, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

2023

Online Video Super-Resolution With Convolutional Kernel Bypass Grafts.

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Multim., 2023

Online Video Streaming Super-Resolution with Adaptive Look-Up Table Fusion.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2023

SparDA: Accelerating Dynamic Sparse Deep Neural Networks via Sparse-Dense Transformation.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2023

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation.

[DOI]

,

,

,

,

,

,

,

Chengruidong Zhang

,

,

,

Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Optimizing Dynamic Neural Networks with Brainstorm.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs.

[DOI]

,

,

,

,

,

IEEE Trans. Computers, 2022

Online Video Super-Resolution with Convolutional Kernel Bypass Graft.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

QoS-Aware Irregular Collaborative Inference for Improving Throughput of DNN Services.

[DOI]

,

,

,

,

,

,

Proceedings of the SC22: International Conference for High Performance Computing, 2022

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

Astraea: towards QoS-aware and resource-efficient multi-stage GPU services.

[DOI]

,

,

,

,

,

,

Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021

nn-METER: Towards Accurate Latency Prediction of DNN Inference on Diverse Edge Devices.

[DOI]

,

,

,

,

,

GetMobile Mob. Comput. Commun., 2021

Full-Cycle Energy Consumption Benchmark for Low-Carbon Computer Vision.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2021

Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2021

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices.

[DOI]

,

,

,

,

,

,

Proceedings of the MobiSys '21: The 19th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, Wisconsin, USA, 24 June, 2021

CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters.

[DOI]

,

,

,

,

,

,

Proceedings of the 39th IEEE International Conference on Computer Design, 2021

2020

Towards QoS-Aware and Resource-Efficient GPU Microservices Based on Spatial Multitasking GPUs In Datacenters.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2020

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds.

[DOI]

,

,

,

,

,

,

,

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

2019

URSA: Precise Capacity Planning and Contention-aware Scheduling for Public Clouds.

[DOI]

,

,

,

,

,

,

CoRR, 2019

POSTER: Precise Capacity Planning for Database Public Clouds.

[DOI]

,

,

,

,

,

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

CLIBE: Precise Cluster-Level I/O Bandwidth Enforcement in Distributed File System.

[DOI]

,

,

,

Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

Loading...