Wei Niu

Orcid: 0000-0002-2697-7042

Affiliations:
  • University of Georgia, Athens, GA, USA
  • College of William & Mary, Williamsburg, VA, USA (PhD)


According to our database1, Wei Niu authored at least 79 papers between 2017 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory.
CoRR, August, 2025

Structured Agent Distillation for Large Language Model.
CoRR, May, 2025

TSLA: A Task-Specific Learning Adaptation for Semantic Segmentation on Autonomous Vehicles Platform.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2025

Towards Recognizing Food Types for Unseen Subjects.
ACM Trans. Comput. Heal., January, 2025

RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation.
CoRR, January, 2025

Sparse Learning for State Space Models on Mobile.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

GaussianSpa: An "Optimizing-Sparsifying" Simplification Framework for Compact and High-Quality 3D Gaussian Splatting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Open-Source Acceleration of Stable-Diffusion.cpp.
CoRR, 2024

AdaCM<sup>2</sup>: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction.
CoRR, 2024

NeurLZ: On Enhancing Lossy Compression Performance based on Error-Controlled Neural Learning for Scientific Data.
CoRR, 2024

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion.
CoRR, 2024

SoD<sup>2</sup>: Statically Optimizing Dynamic Deep Neural Network.
CoRR, 2024

EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge.
CoRR, 2024

Real-time Core-Periphery Guided ViT with Smart Data Layout Selection on Mobile Devices.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Exploring Token Pruning in Vision State Space Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

NeurRev: Train Better Sparse Neural Network Practically via Neuron Revitalization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data.
Proceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, 2024

Data Overfitting for On-device Super-Resolution with Dynamic Algorithm and Compiler Co-design.
Proceedings of the Computer Vision - ECCV 2024, 2024

SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

SoD<sup>2</sup>: Statically Optimizing Dynamic Deep Neural Network Execution.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Survey: Exploiting Data Redundancy for Optimization of Deep Learning.
ACM Comput. Surv., 2023

Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges.
CoRR, 2023

Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

Pruning Parameterization with Bi-level Optimization for Efficient Semantic Segmentation on the Edge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving.
Proceedings of the 2023 Workshop on Advanced Multimedia Computing for Smart Manufacturing and Engineering, 2023

Towards Real-Time Segmentation on the Edge.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.
ACM Trans. Embed. Comput. Syst., September, 2022

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration.
ACM Trans. Design Autom. Electr. Syst., 2022

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Brief Industry Paper: Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card.
Proceedings of the 28th IEEE Real-Time and Embedded Technology and Applications Symposium, 2022

SparCL: Sparse Continual Learning on the Edge.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GCD<sup>2</sup>: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

BLCR: Towards Real-time DNN Execution with Block-based Reweighted Pruning.
Proceedings of the 23rd International Symposium on Quality Electronic Design, 2022

Real-Time Portrait Stylization on the Edge.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution.
Proceedings of the Computer Vision - ECCV 2022, 2022

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning.
CoRR, 2021

Enabling Level-4 Autonomous Driving on a Single 1 Off-the-Shelf Card.
CoRR, 2021

Achieving Real-Time Object Detection on MobileDevices with Neural Pruning Search.
CoRR, 2021

CoCoPIE: enabling real-time AI on off-the-shelf mobile devices via compression-compilation co-design.
Commun. ACM, 2021

Brief Industry Paper: Towards Real-Time 3D Object Detection for Autonomous Vehicles with Pruning Search.
Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium, 2021

Work in Progress: Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.
Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium, 2021

DNNFusion: accelerating deep neural networks execution with advanced operator fusion.
Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

A Compression-Compilation Framework for On-mobile Real-time BERT Applications.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

HEALS: A Parallel eALS Recommendation System on CPU/GPU Heterogeneous Platforms.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

Neural Pruning Search for Real-Time Object Detection of Autonomous Vehicles.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

NPAS: A Compiler-Aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Real-Time Mobile Acceleration of DNNs: From Computer Vision to Medical Applications.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

A Compression-Compilation Co-Design Framework Towards Real-Time Object Detection on Mobile Devices.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device.
CoRR, 2020

6.7ms on Mobile with over 78% ImageNet Accuracy: Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration.
CoRR, 2020

An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning.
CoRR, 2020

Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization.
CoRR, 2020

Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices.
CoRR, 2020

A Privacy-Preserving DNN Pruning and Mobile Acceleration Framework.
CoRR, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.
CoRR, 2020

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method.
CoRR, 2020

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices.
CoRR, 2020

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

An Image Enhancing Pattern-Based Sparsity for Real-Time Inference on Mobile Devices.
Proceedings of the Computer Vision - ECCV 2020, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-Time Execution on Mobile Devices.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone.
CoRR, 2019

2017
User-aware partitioning algorithm for mobile cloud computing based on maximum graph cuts.
Comput. Networks, 2017


  Loading...