Wei Niu

Orcid: 0000-0002-2697-7042

Affiliations:

University of Georgia, Athens, GA, USA
College of William & Mary, Williamsburg, VA, USA (PhD)

According to our database¹, Wei Niu authored at least 83 papers between 2017 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Mobile-3DCNN: An Acceleration Framework for Ultra-Real-Time Execution of Large 3D CNNs on Mobile Devices.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., September, 2025

End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost.

[BibT_eX]

[DOI]

CoRR, September, 2025

RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory.

[BibT_eX]

[DOI]

CoRR, August, 2025

Structured Agent Distillation for Large Language Model.

[BibT_eX]

[DOI]

CoRR, May, 2025

TSLA: A Task-Specific Learning Adaptation for Semantic Segmentation on Autonomous Vehicles Platform.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2025

Towards Recognizing Food Types for Unseen Subjects.

[BibT_eX]

[DOI]

ACM Trans. Comput. Heal., January, 2025

RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation.

[BibT_eX]

[DOI]

CoRR, January, 2025

NeurLZ: An Online Neural Learning-based Method to Enhance Scientific Lossy Compression.

[BibT_eX]

[DOI]

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

Sparse Learning for State Space Models on Mobile.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

GaussianSpa: An "Optimizing-Sparsifying" Simplification Framework for Compact and High-Quality 3D Gaussian Splatting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Open-Source Acceleration of Stable-Diffusion.cpp.

[BibT_eX]

[DOI]

CoRR, 2024

AdaCM2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction.

[BibT_eX]

[DOI]

CoRR, 2024

NeurLZ: On Enhancing Lossy Compression Performance based on Error-Controlled Neural Learning for Scientific Data.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion.

[BibT_eX]

[DOI]

CoRR, 2024

SoD2: Statically Optimizing Dynamic Deep Neural Network.

[BibT_eX]

[DOI]

Wei Niu

Gagan Agrawal

Bin Ren

CoRR, 2024

EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge.

[BibT_eX]

[DOI]

CoRR, 2024

Real-time Core-Periphery Guided ViT with Smart Data Layout Selection on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Exploring Token Pruning in Vision State Space Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

NeurRev: Train Better Sparse Neural Network Practically via Neuron Revitalization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data.

[BibT_eX]

[DOI]

Proceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, 2024

Data Overfitting for On-device Super-Resolution with Dynamic Algorithm and Compiler Co-design.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile.

[BibT_eX]

[DOI]

Wei Niu

Md. Musfiqur Rahman Sanim

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

SoD2: Statically Optimizing Dynamic Deep Neural Network Execution.

[BibT_eX]

[DOI]

Wei Niu

Gagan Agrawal

Bin Ren

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Survey: Exploiting Data Redundancy for Optimization of Deep Learning.

[BibT_eX]

[DOI]

ACM Comput. Surv., 2023

Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges.

[BibT_eX]

[DOI]

CoRR, 2023

Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the 2023 USENIX Annual Technical Conference, 2023

Pruning Parameterization with Bi-level Optimization for Efficient Semantic Segmentation on the Edge.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the 2023 Workshop on Advanced Multimedia Computing for Smart Manufacturing and Engineering, 2023

Towards Real-Time Segmentation on the Edge.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., September, 2022

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2022

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Brief Industry Paper: Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card.

[BibT_eX]

[DOI]

Proceedings of the 28th IEEE Real-Time and Embedded Technology and Applications Symposium, 2022

SparCL: Sparse Continual Learning on the Edge.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GCD2: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

BLCR: Towards Real-time DNN Execution with Block-based Reweighted Pruning.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on Quality Electronic Design, 2022

Real-Time Portrait Stylization on the Edge.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

SPViT: Enabling Faster Vision Transformers via Soft Token Pruning.

[BibT_eX]

[DOI]

CoRR, 2021

Enabling Level-4 Autonomous Driving on a Single 1 Off-the-Shelf Card.

[BibT_eX]

[DOI]

CoRR, 2021

Achieving Real-Time Object Detection on MobileDevices with Neural Pruning Search.

[BibT_eX]

[DOI]

CoRR, 2021

CoCoPIE: enabling real-time AI on off-the-shelf mobile devices via compression-compilation co-design.

[BibT_eX]

[DOI]

Commun. ACM, 2021

Brief Industry Paper: Towards Real-Time 3D Object Detection for Autonomous Vehicles with Pruning Search.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium, 2021

Work in Progress: Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium, 2021

DNNFusion: accelerating deep neural networks execution with advanced operator fusion.

[BibT_eX]

[DOI]

Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

A Compression-Compilation Framework for On-mobile Real-time BERT Applications.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

HEALS: A Parallel eALS Recommendation System on CPU/GPU Heterogeneous Platforms.

[BibT_eX]

[DOI]

Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

Neural Pruning Search for Real-Time Object Detection of Autonomous Vehicles.

[BibT_eX]

[DOI]

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

NPAS: A Compiler-Aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Real-Time Mobile Acceleration of DNNs: From Computer Vision to Medical Applications.

[BibT_eX]

[DOI]

Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

A Compression-Compilation Co-Design Framework Towards Real-Time Object Detection on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device.

[BibT_eX]

[DOI]

CoRR, 2020

6.7ms on Mobile with over 78% ImageNet Accuracy: Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration.

[BibT_eX]

[DOI]

CoRR, 2020

An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning.

[BibT_eX]

[DOI]

CoRR, 2020

Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization.

[BibT_eX]

[DOI]

CoRR, 2020

Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices.

[BibT_eX]

[DOI]

CoRR, 2020

A Privacy-Preserving DNN Pruning and Mobile Acceleration Framework.

[BibT_eX]

[DOI]

CoRR, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method.

[BibT_eX]

[DOI]

CoRR, 2020

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices.

[BibT_eX]

[DOI]

CoRR, 2020

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework.

[BibT_eX]

[DOI]

Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

An Image Enhancing Pattern-Based Sparsity for Real-Time Inference on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-Time Execution on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone.

[BibT_eX]

[DOI]

CoRR, 2019

2017

User-aware partitioning algorithm for mobile cloud computing based on maximum graph cuts.

[BibT_eX]

[DOI]

Comput. Networks, 2017

Wei Niu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...