Bin Ren

Orcid: 0000-0002-4116-5237

Affiliations:
  • William & Mary, Williamsburg, VA, USA


According to our database1, Bin Ren authored at least 99 papers between 2011 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Mobile-3DCNN: An Acceleration Framework for Ultra-Real-Time Execution of Large 3D CNNs on Mobile Devices.
ACM Trans. Archit. Code Optim., September, 2025

HiSin: Efficient High-Resolution Sinogram Inpainting via Resolution-Guided Progressive Inference.
CoRR, June, 2025

Towards Recognizing Food Types for Unseen Subjects.
ACM Trans. Comput. Heal., January, 2025

TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations.
Proceedings of the 39th ACM International Conference on Supercomputing, 2025

Generalizing Reuse Patterns for Efficient DNN on Microcontrollers.
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024
On Item-Sampling Evaluation for Recommender System.
Trans. Recomm. Syst., March, 2024

FCDM: Sparse-view Sinogram Inpainting with Frequency Domain Convolution Enhanced Diffusion Models.
CoRR, 2024

SoD<sup>2</sup>: Statically Optimizing Dynamic Deep Neural Network.
CoRR, 2024

DEFCON: Deformable Convolutions Leveraging Interval Search and GPU Texture Hardware.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

NeurRev: Train Better Sparse Neural Network Practically via Neuron Revitalization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

DACO: Pursuing Ultra-low Power Consumption via DNN-Adaptive CPU-GPU CO-optimization on Mobile Devices.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

SoD<sup>2</sup>: Statically Optimizing Dynamic Deep Neural Network Execution.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Survey: Exploiting Data Redundancy for Optimization of Deep Learning.
ACM Comput. Surv., 2023

Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

Pruning Parameterization with Bi-level Optimization for Efficient Semantic Segmentation on the Edge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Real-Time Segmentation on the Edge.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Towards Reliable Item Sampling for Recommendation Evaluation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.
ACM Trans. Embed. Comput. Syst., September, 2022

MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging.
IEEE Trans. Parallel Distributed Syst., 2022

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration.
ACM Trans. Design Autom. Electr. Syst., 2022

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework.
CoRR, 2022

Brief Industry Paper: Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card.
Proceedings of the 28th IEEE Real-Time and Embedded Technology and Applications Symposium, 2022

SparCL: Sparse Continual Learning on the Edge.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Socially Acceptable Food Type Recognition.
Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

GCD<sup>2</sup>: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

BLCR: Towards Real-time DNN Execution with Block-based Reweighted Pruning.
Proceedings of the 23rd International Symposium on Quality Electronic Design, 2022

Real-Time Portrait Stylization on the Edge.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution.
Proceedings of the Computer Vision - ECCV 2022, 2022

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning.
CoRR, 2021

Enabling Level-4 Autonomous Driving on a Single 1 Off-the-Shelf Card.
CoRR, 2021

Achieving Real-Time Object Detection on MobileDevices with Neural Pruning Search.
CoRR, 2021

A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR.
CoRR, 2021

CoCoPIE: enabling real-time AI on off-the-shelf mobile devices via compression-compilation co-design.
Commun. ACM, 2021

Toward efficient interactions between Python and native libraries.
Proceedings of the ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021

Brief Industry Paper: Towards Real-Time 3D Object Detection for Autonomous Vehicles with Pruning Search.
Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium, 2021

Work in Progress: Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.
Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium, 2021

DNNFusion: accelerating deep neural networks execution with advanced operator fusion.
Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A High Performance Sparse Tensor Algebra Compiler in MLIR.
Proceedings of the 7th IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC, 2021

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

A Compression-Compilation Framework for On-mobile Real-time BERT Applications.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

HEALS: A Parallel eALS Recommendation System on CPU/GPU Heterogeneous Platforms.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

Neural Pruning Search for Real-Time Object Detection of Autonomous Vehicles.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

NPAS: A Compiler-Aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Real-Time Mobile Acceleration of DNNs: From Computer Vision to Medical Applications.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

A Compression-Compilation Co-Design Framework Towards Real-Time Object Detection on Mobile Devices.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device.
CoRR, 2020

6.7ms on Mobile with over 78% ImageNet Accuracy: Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration.
CoRR, 2020

An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning.
CoRR, 2020

Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization.
CoRR, 2020

Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices.
CoRR, 2020

CoCoPIE: Making Mobile AI Sweet As PIE -Compression-Compilation Co-Design Goes a Long Way.
CoRR, 2020

A Privacy-Preserving DNN Pruning and Mobile Acceleration Framework.
CoRR, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.
CoRR, 2020

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method.
CoRR, 2020

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices.
CoRR, 2020

Petascale XCT: 3D image reconstruction with hierarchical communications on multi-GPU nodes.
Proceedings of the International Conference for High Performance Computing, 2020

COMET: A Domain-Specific Compilation of High-Performance Computational Chemistry.
Proceedings of the Languages and Compilers for Parallel Computing, 2020

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Parallelizing pruned landmark labeling: dealing with dependencies in graph algorithms.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

On Efficient Constructions of Checkpoints.
Proceedings of the 37th International Conference on Machine Learning, 2020

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

An Image Enhancing Pattern-Based Sparsity for Real-Time Inference on Mobile Devices.
Proceedings of the Computer Vision - ECCV 2020, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

ATMem: adaptive data placement in graph applications on heterogeneous memories.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-Time Execution on Mobile Devices.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Extracting SIMD Parallelism from Recursive Task-Parallel Programs.
ACM Trans. Parallel Comput., 2019

Pruned Landmark Labeling Meets Vertex Centric Computation: A Surprisingly Happy Marriage!
CoRR, 2019

26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone.
CoRR, 2019

MemXCT: memory-centric X-ray CT reconstruction with massive parallelization.
Proceedings of the International Conference for High Performance Computing, 2019

Transforming Query Sequences for High-Throughput B+ Tree Processing on Many-Core Processors.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018
Graphphi: efficient parallel graph processing on emerging throughput-oriented architectures.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Real-Time Data Analysis and Autonomous Steering of Synchrotron Light Source Experiments.
Proceedings of the 13th IEEE International Conference on e-Science, 2017

2016
User-Assisted Store Recycling for Dynamic Task Graph Schedulers.
ACM Trans. Archit. Code Optim., 2016

User-assisted storage reuse determination for dynamic task graphs.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

On the Impact of Widening Vector Registers on Sequence Alignment.
Proceedings of the 45th International Conference on Parallel Processing, 2016

MicroSpec: Speculation-Centric Fine-Grained Parallelization for FSM Computations.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Efficient execution of recursive programs on commodity vector hardware.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Automatic and Efficient Data Host-Device Communication for Many-Core Coprocessors.
Proceedings of the Languages and Compilers for Parallel Computing, 2015

Low-Overhead Fault-Tolerance Support Using DISC Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 2015

Efficient and Simplified Parallel Graph Processing over CPU and MIC.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014
A Portable Optimization Engine for Accelerating Irregular Data-Traversal Applications on SIMD Architectures.
ACM Trans. Archit. Code Optim., 2014

Automating and optimizing data transfers for many-core coprocessors.
Proceedings of the 2014 International Conference on Supercomputing, 2014

A programming system for xeon phis with runtime SIMD parallelization.
Proceedings of the 2014 International Conference on Supercomputing, 2014

2013
SIMD parallelization of applications that traverse irregular data structures.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012
Fine-grained parallel traversals of irregular data structures.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Translating Chapel to Use FREERIDE: A Case Study in Using an HPC Language for Data-Intensive Computing.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011


  Loading...