Zhengzhong Tu

Orcid: 0000-0002-7594-2292

According to our database1, Zhengzhong Tu authored at least 96 papers between 2018 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition.
CoRR, August, 2025

Subjective and Objective Quality Assessment of Banding Artifacts on Compressed Videos.
CoRR, August, 2025

KANMixer: Can KAN Serve as a New Modeling Core for Long-term Time Series Forecasting?
CoRR, August, 2025

Edge-Based Multimodal Sensor Data Fusion with Vision Language Models (VLMs) for Real-time Autonomous Vehicle Accident Avoidance.
CoRR, August, 2025

MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding.
CoRR, July, 2025

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality.
CoRR, July, 2025

4KAgent: Agentic Any Image to 4K Super-Resolution.
CoRR, July, 2025

Automated Vehicles Should be Connected with Natural Language.
CoRR, July, 2025

AirV2X: Unified Air-Ground Vehicle-to-Everything Collaboration.
CoRR, June, 2025

DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving.
CoRR, June, 2025

Demystifying the Visual Quality Paradox in Multimodal Large Language Models.
CoRR, June, 2025

SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems.
CoRR, June, 2025

V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving.
CoRR, June, 2025

MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning.
CoRR, May, 2025

mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation.
CoRR, May, 2025

DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models.
CoRR, May, 2025

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation.
CoRR, May, 2025

Simulating the Unseen: Crash Prediction Must Learn from What Did Not Happen.
CoRR, May, 2025

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction.
CoRR, May, 2025

SounDiT: Geo-Contextual Soundscape-to-Landscape Generation.
CoRR, May, 2025

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization.
CoRR, May, 2025

Generative AI for Autonomous Driving: Frontiers and Opportunities.
CoRR, May, 2025

VISTA: Generative Visual Imagination for Vision-and-Language Navigation.
CoRR, May, 2025

NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results.
CoRR, May, 2025

GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution.
CoRR, May, 2025

The Role of Open-Source LLMs in Shaping the Future of GeoAI.
CoRR, April, 2025

NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results.
CoRR, April, 2025

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report.
CoRR, April, 2025

NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving.
CoRR, April, 2025

UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving.
CoRR, March, 2025

Can Large Vision Language Models Read Maps Like a Human?
CoRR, March, 2025

PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing.
CoRR, March, 2025

CoCMT: Communication-Efficient Cross-Modal Transformer for Collaborative Perception.
CoRR, March, 2025

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning.
CoRR, March, 2025

Generative AI in Transportation Planning: A Survey.
CoRR, March, 2025

Secure On-Device Video OOD Detection Without Backpropagation.
CoRR, March, 2025

V2X-LLM: Enhancing V2X Integration and Understanding in Connected Vehicle Corridors.
CoRR, March, 2025

Complex LLM Planning via Automated Heuristics Discovery.
CoRR, February, 2025

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective.
CoRR, February, 2025

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization.
CoRR, February, 2025

V2X-ViTv2: Improved Vision Transformers for Vehicle-to-Everything Cooperative Perception.
IEEE Trans. Pattern Anal. Mach. Intell., January, 2025

Bigger is not Always Better: Scaling Properties of Latent Diffusion Models.
Trans. Mach. Learn. Res., 2025

Subjective and Objective Quality Assessment of Banding Artifacts on Compressed Videos.
IEEE Trans. Image Process., 2025

Understanding, detecting, and removing perceptual banding artifacts in compressed videos.
Signal Process. Image Commun., 2025

OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

4K4DGen: Panoramic 4D Generation at 4K Resolution.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

STAMP: Scalable Task- And Model-agnostic Collaborative Perception.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025


NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LangCoop: Collaborative Driving with Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

2024
MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers.
IEEE Trans. Image Process., 2024

FAVER: Blind quality prediction of variable frame rate videos.
Signal Process. Image Commun., 2024

AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving.
CoRR, 2024

Political-LLM: Large Language Models in Political Science.
CoRR, 2024

Video Quality Assessment: A Comprehensive Survey.
CoRR, 2024

Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models.
CoRR, 2024

CoMamba: Real-time Cooperative Perception Unlocked with State Space Models.
CoRR, 2024

AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results.
CoRR, 2024

V2X-DGW: Domain Generalization for Multi-agent Perception under Adverse Weather Conditions.
CoRR, 2024


SPIRE: Semantic Prompt-Driven Image Restoration.
Proceedings of the Computer Vision - ECCV 2024, 2024

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

COVER: A Comprehensive Video Quality Evaluator.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024


2023
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions.
CoRR, 2023

Conditional Diffusion Distillation.
CoRR, 2023

Pik-Fix: Restoring and Colorizing Old Photos.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

MULLER: Multilayer Laplacian Resizer for Vision.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Completely Blind Video Quality Evaluator.
IEEE Signal Process. Lett., 2022

Perceptual Quality Assessment of UGC Gaming Videos.
CoRR, 2022

Subjective Quality Assessment of User-Generated Content Gaming Videos.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2022

Blind Video Quality Assessment via Space-Time Slice Statistics.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

No-Reference Quality Assessment of Variable Frame-Rate Videos Using Temporal Bandpass Statistics.
Proceedings of the IEEE International Conference on Acoustics, 2022

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

MaxViT: Multi-axis Vision Transformer.
Proceedings of the Computer Vision, 2022

MAXIM: Multi-Axis MLP for Image Processing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers.
Proceedings of the Conference on Robot Learning, 2022

2021
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content.
IEEE Trans. Image Process., 2021

Predicting Eye Fixations Under Distortion Using Bayesian Observers.
CoRR, 2021

RAPIQUE: Rapid and Accurate Video Quality Prediction of User Generated Content.
CoRR, 2021

Efficient User-Generated Video Quality Prediction.
Proceedings of the Picture Coding Symposium, 2021

A Temporal Statistics Model For UGC Video Quality Prediction.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Video Quality Assessment of User Generated Content: A Benchmark Study and a New Model.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Regression or classification? New methods to evaluate no-reference picture and video quality models.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Adaptive Debanding Filter.
IEEE Signal Process. Lett., 2020

A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment.
Proceedings of the IEEE International Conference on Image Processing, 2020

BBAND INDEX: A NO-REFERENCE BANDING ARTIFACT PREDICTOR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Fitness Done Right: a Real-time Intelligent Personal Trainer for Exercise Correction.
CoRR, 2019

2018
Panoramic video delivery based on Laplace compensation and Sphere-Markov probability model.
Proceedings of the IEEE International Conference on Consumer Electronics, 2018

Content adaptive tiling method based on user access preference for streaming panoramic video.
Proceedings of the IEEE International Conference on Consumer Electronics, 2018


  Loading...