Yang Sui

Orcid: 0000-0003-3020-0612

Affiliations:
  • Rutgers University, Department of Electrical and Computer Engineering, Piscataway, NJ, USA (PhD 2024)


According to our database1, Yang Sui authored at least 46 papers between 2021 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload.
CoRR, May, 2026

ATA: Bridging Implicit Reasoning with Attention-Guided and Action-Guided Inference for Vision-Language Action Models.
CoRR, March, 2026

A Survey of Token Compression for Efficient Multimodal Large Language Models.
Trans. Mach. Learn. Res., 2026

2025
Pruning 3D Convolutional Neural Networks via Channel Independence.
J. Signal Process. Syst., December, 2025

EcoSpa: Efficient Transformer Training with Coupled Sparsity.
CoRR, November, 2025

LowDiff: Efficient Diffusion Sampling with Low-Resolution Condition.
CoRR, September, 2025

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios.
CoRR, July, 2025

AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models.
CoRR, May, 2025

Co-Exploring Structured Sparsification and Low-Rank Tensor Decomposition for Compact DNNs.
IEEE Trans. Neural Networks Learn. Syst., April, 2025

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float.
CoRR, April, 2025

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models.
CoRR, March, 2025

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models.
CoRR, March, 2025

Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization.
CoRR, February, 2025

DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models.
Trans. Mach. Learn. Res., 2025

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models.
Trans. Mach. Learn. Res., 2025

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11).
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

HoliTom: Holistic Token Merging for Fast Video Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

iTAP: An Incremental Task Graph Partitioner for Task-parallel Static Timing Analysis.
Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

2024
Corner-to-Center long-range context model for efficient learned image compression.
J. Vis. Commun. Image Represent., 2024

Understanding Artificial Neural Network's Behavior from Neuron Activation Perspective.
CoRR, 2024

MoE-I<sup>2</sup>: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition.
CoRR, 2024

ELRT: Efficient Low-Rank Training for Compact Convolutional Neural Networks.
CoRR, 2024

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MOPED: Efficient Motion Planning Engine with Flexible Dimension Support.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

MoE-I²: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Clean and Compact: Efficient Data-Free Backdoor Defense with Model Compactness.
Proceedings of the Computer Vision - ECCV 2024, 2024

Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations.
Proceedings of the Data Compression Conference, 2024

Invited: Algorithm and Hardware Co-Design for Energy-Efficient Neural SLAM.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Transferable Learned Image Compression-Resistant Adversarial Perturbations.
Proceedings of the 35th British Machine Vision Conference, 2024

2023
In-Sensor Radio Frequency Computing for Energy-Efficient Intelligent Radar.
CoRR, 2023

Learning-based Homography Matrix Optimization for Dual-fisheye Video Stitching.
Proceedings of the 2023 Workshop on Emerging Multimedia Systems, 2023

ETTE: Efficient Tensor-Train-based Computing Engine for Deep Neural Networks.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

DynGMP: Graph Neural Network-Based Motion Planning in Unpredictable Dynamic Environments.
IROS, 2023

Invited Paper: In-Sensor Radio Frequency Computing for Energy-Efficient Intelligent Radar.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

DSPIMM: A Fully Digital SParse In-Memory Matrix Vector Multiplier for Communication Applications.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

CSTAR: Towards Compact and Structured Deep Neural Networks with Adversarial Robustness.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition.
CoRR, 2022

HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
CHIP: CHannel Independence-based Pruning for Compact Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

GoSPA: An Energy-efficient High-performance Globally Optimized SParse Convolutional Neural Network Accelerator.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Algorithm and Hardware Co-design for Deep Learning-powered Channel Decoder: A Case Study.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Towards Efficient Tensor Decomposition-Based DNN Model Compression With Optimization Framework.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021


  Loading...