Zhengzhong Tu

Orcid: 0000-0002-7594-2292

According to our database1, Zhengzhong Tu authored at least 144 papers between 2018 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Towards Agentic Urban Digital Twins (AUDiTs): advancing new urban science through Human-AI co-learning agents.
Urban Inform., December, 2026

A Survey on LLM-based Conversational User Simulation.
CoRR, April, 2026

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects.
CoRR, April, 2026

Knowledge Is Not Static: Order-Aware Hypergraph RAG for Language Models.
CoRR, April, 2026

How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles.
CoRR, April, 2026

Physics-Aware Video Instance Removal Benchmark.
CoRR, April, 2026

Region-R1: Reinforcing Query-Side Region Cropping for Multi-Modal Re-Ranking.
CoRR, April, 2026

AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference.
CoRR, April, 2026

Training a Student Expert via Semi-Supervised Foundation Model Distillation.
CoRR, April, 2026

Learn2Fold: Structured Origami Generation with World Model Planning.
CoRR, March, 2026

Let the Abyss Stare Back Adaptive Falsification for Autonomous Scientific Discovery.
CoRR, March, 2026

NavTrust: Benchmarking Trustworthiness for Embodied Navigation.
CoRR, March, 2026

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation.
CoRR, March, 2026

The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics.
CoRR, March, 2026

Traversal-as-Policy: Log-Distilled Gated Behavior Trees as Externalized, Verifiable Policies for Safe, Robust, and Efficient Agents.
CoRR, March, 2026

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions.
CoRR, March, 2026

Human-Aligned MLLM Judges for Fine-Grained Image Editing Evaluation: A Benchmark, Framework, and Analysis.
CoRR, February, 2026

ConsID-Gen: View-Consistent and Identity-Preserving Image-to-Video Generation.
CoRR, February, 2026

Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling.
CoRR, February, 2026

PISCO: Precise Video Instance Insertion with Sparse Control.
CoRR, February, 2026

Modular Safety Guardrails Are Necessary for Foundation-Model-Enabled Robots in the Real World.
CoRR, February, 2026

FASA: Frequency-aware Sparse Attention.
CoRR, February, 2026

Position: Human-Centric AI Requires a Minimum Viable Level of Human Understanding.
CoRR, February, 2026

BibAgent: An Agentic Framework for Traceable Miscitation Detection in Scientific Literature.
CoRR, January, 2026

Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models.
CoRR, January, 2026


CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
FlowSteer: Conditioning Flow Field for Consistent Image Restoration.
CoRR, December, 2025

Knowing the Answer Isn't Enough: Fixing Reasoning Path Failures in LVLMs.
CoRR, December, 2025

NexusFlow: Unifying Disparate Tasks under Partial Supervision via Invertible Flow Networks.
CoRR, December, 2025

Charts Are Not Images: On the Challenges of Scientific Chart Editing.
CoRR, December, 2025

VISTAv2: World Imagination for Indoor Vision-and-Language Navigation.
CoRR, December, 2025

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs.
CoRR, November, 2025

TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting.
CoRR, November, 2025

T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs.
CoRR, November, 2025

Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries.
CoRR, November, 2025

FORGE-Tree: Diffusion-Forcing Tree Search for Long-Horizon Robot Manipulation.
CoRR, October, 2025

Background Fades, Foreground Leads: Curriculum-Guided Background Pruning for Efficient Foreground-Centric Collaborative Perception.
CoRR, October, 2025

SafeCoop: Unravelling Full Stack Safety in Agentic Collaborative Driving.
CoRR, October, 2025

LLMs Can Get "Brain Rot"!
CoRR, October, 2025

HeadsUp! High-Fidelity Portrait Image Super-Resolution.
CoRR, October, 2025

Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization.
CoRR, October, 2025

Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning.
CoRR, October, 2025

SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling.
CoRR, August, 2025

AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition.
CoRR, August, 2025

KANMixer: Can KAN Serve as a New Modeling Core for Long-term Time Series Forecasting?
CoRR, August, 2025

Edge-Based Multimodal Sensor Data Fusion with Vision Language Models (VLMs) for Real-time Autonomous Vehicle Accident Avoidance.
CoRR, August, 2025

MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding.
CoRR, July, 2025

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality.
CoRR, July, 2025

4KAgent: Agentic Any Image to 4K Super-Resolution.
CoRR, July, 2025

Automated Vehicles Should be Connected with Natural Language.
CoRR, July, 2025

AirV2X: Unified Air-Ground Vehicle-to-Everything Collaboration.
CoRR, June, 2025

Demystifying the Visual Quality Paradox in Multimodal Large Language Models.
CoRR, June, 2025

SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems.
CoRR, June, 2025

V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving.
CoRR, June, 2025

MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning.
CoRR, May, 2025

mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation.
CoRR, May, 2025

DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models.
CoRR, May, 2025

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation.
CoRR, May, 2025

Simulating the Unseen: Crash Prediction Must Learn from What Did Not Happen.
CoRR, May, 2025

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction.
CoRR, May, 2025

SounDiT: Geo-Contextual Soundscape-to-Landscape Generation.
CoRR, May, 2025

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization.
CoRR, May, 2025

Generative AI for Autonomous Driving: Frontiers and Opportunities.
CoRR, May, 2025

VISTA: Generative Visual Imagination for Vision-and-Language Navigation.
CoRR, May, 2025

NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results.
CoRR, May, 2025

The Role of Open-Source LLMs in Shaping the Future of GeoAI.
CoRR, April, 2025

NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results.
CoRR, April, 2025

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report.
CoRR, April, 2025

Can Large Vision Language Models Read Maps Like a Human?
CoRR, March, 2025

PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing.
CoRR, March, 2025

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning.
CoRR, March, 2025

Generative AI in Transportation Planning: A Survey.
CoRR, March, 2025

V2X-LLM: Enhancing V2X Integration and Understanding in Connected Vehicle Corridors.
CoRR, March, 2025

Complex LLM Planning via Automated Heuristics Discovery.
CoRR, February, 2025

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective.
CoRR, February, 2025

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization.
CoRR, February, 2025

V2X-ViTv2: Improved Vision Transformers for Vehicle-to-Everything Cooperative Perception.
IEEE Trans. Pattern Anal. Mach. Intell., January, 2025

AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving.
Trans. Mach. Learn. Res., 2025

Bigger is not Always Better: Scaling Properties of Latent Diffusion Models.
Trans. Mach. Learn. Res., 2025

Subjective and Objective Quality Assessment of Banding Artifacts on Compressed Videos.
IEEE Trans. Image Process., 2025

Understanding, detecting, and removing perceptual banding artifacts in compressed videos.
Signal Process. Image Commun., 2025

OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

CoCMT: Communication-Efficient Cross-Modal Transformer for Collaborative Perception.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

CoMamba: Real-time Cooperative Perception Unlocked with State-Space Models.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

V2X-DGW: Domain Generalization for Multi-Agent Perception Under Adverse Weather Conditions.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

4K4DGen: Panoramic 4D Generation at 4K Resolution.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

STAMP: Scalable Task- And Model-agnostic Collaborative Perception.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Uniocc: a Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

VQualA 2025 Challenge on Image Super-Resolution Generated Content Quality Assessment: Methods and Results.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Secure On-Device Video OOD Detection without Backpropagation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Drama-X: A Fine-Grained Intent Prediction and Risk Reasoning Benchmark for Driving.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025


GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025


NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LangCoop: Collaborative Driving with Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

2024
MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers.
IEEE Trans. Image Process., 2024

FAVER: Blind quality prediction of variable frame rate videos.
Signal Process. Image Commun., 2024

Political-LLM: Large Language Models in Political Science.
CoRR, 2024

Video Quality Assessment: A Comprehensive Survey.
CoRR, 2024

Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models.
CoRR, 2024

AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results.
CoRR, 2024

V2X-DGW: Domain Generalization for Multi-agent Perception under Adverse Weather Conditions.
CoRR, 2024


SPIRE: Semantic Prompt-Driven Image Restoration.
Proceedings of the Computer Vision - ECCV 2024, 2024

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

COVER: A Comprehensive Video Quality Evaluator.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024


2023
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions.
CoRR, 2023

Conditional Diffusion Distillation.
CoRR, 2023

Pik-Fix: Restoring and Colorizing Old Photos.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

MULLER: Multilayer Laplacian Resizer for Vision.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Completely Blind Video Quality Evaluator.
IEEE Signal Process. Lett., 2022

Perceptual Quality Assessment of UGC Gaming Videos.
CoRR, 2022

Subjective Quality Assessment of User-Generated Content Gaming Videos.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2022

Blind Video Quality Assessment via Space-Time Slice Statistics.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

No-Reference Quality Assessment of Variable Frame-Rate Videos Using Temporal Bandpass Statistics.
Proceedings of the IEEE International Conference on Acoustics, 2022

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

MaxViT: Multi-axis Vision Transformer.
Proceedings of the Computer Vision, 2022

MAXIM: Multi-Axis MLP for Image Processing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers.
Proceedings of the Conference on Robot Learning, 2022

2021
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content.
IEEE Trans. Image Process., 2021

Predicting Eye Fixations Under Distortion Using Bayesian Observers.
CoRR, 2021

RAPIQUE: Rapid and Accurate Video Quality Prediction of User Generated Content.
CoRR, 2021

Efficient User-Generated Video Quality Prediction.
Proceedings of the Picture Coding Symposium, 2021

A Temporal Statistics Model For UGC Video Quality Prediction.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Video Quality Assessment of User Generated Content: A Benchmark Study and a New Model.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Regression or classification? New methods to evaluate no-reference picture and video quality models.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Adaptive Debanding Filter.
IEEE Signal Process. Lett., 2020

A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment.
Proceedings of the IEEE International Conference on Image Processing, 2020

BBAND INDEX: A NO-REFERENCE BANDING ARTIFACT PREDICTOR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Fitness Done Right: a Real-time Intelligent Personal Trainer for Exercise Correction.
CoRR, 2019

2018
Panoramic video delivery based on Laplace compensation and Sphere-Markov probability model.
Proceedings of the IEEE International Conference on Consumer Electronics, 2018

Content adaptive tiling method based on user access preference for streaming panoramic video.
Proceedings of the IEEE International Conference on Consumer Electronics, 2018


  Loading...