Zuxuan Wu

Orcid: 0000-0002-8689-5807

According to our database¹, Zuxuan Wu authored at least 208 papers between 2014 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

PreferThinker: Reasoning-based Personalized Image Preference Assessment.

[BibT_eX]

[DOI]

CoRR, November, 2025

ZTRS: Zero-Imitation End-to-end Autonomous Driving with Trajectory Scoring.

[BibT_eX]

[DOI]

CoRR, October, 2025

RoboOmni: Proactive Robot Manipulation in Omni-modal Context.

[BibT_eX]

[DOI]

CoRR, October, 2025

COSMO-RL: Towards Trustworthy LMRMs via Joint Safety and Stability.

[BibT_eX]

[DOI]

CoRR, October, 2025

FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue.

[BibT_eX]

[DOI]

CoRR, September, 2025

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

DiffusionAD: Norm-Guided One-Step Denoising Diffusion for Anomaly Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., August, 2025

Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives.

[BibT_eX]

[DOI]

CoRR, August, 2025

StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation.

[BibT_eX]

[DOI]

CoRR, August, 2025

A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

Multimodal Referring Segmentation: A Survey.

[BibT_eX]

[DOI]

CoRR, August, 2025

Multi-Prompt Progressive Alignment for Multi-Source Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

CoRR, July, 2025

StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation.

[BibT_eX]

[DOI]

CoRR, July, 2025

FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization.

[BibT_eX]

[DOI]

CoRR, July, 2025

Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis.

[BibT_eX]

[DOI]

CoRR, July, 2025

Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning.

[BibT_eX]

[DOI]

Zuyao You

Zuxuan Wu

CoRR, June, 2025

Generalized Trajectory Scoring for End-to-end Multimodal Planning.

[BibT_eX]

[DOI]

CoRR, June, 2025

DriveSuprim: Towards Precise Trajectory Selection for End-to-End Planning.

[BibT_eX]

[DOI]

CoRR, June, 2025

Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control.

[BibT_eX]

[DOI]

CoRR, June, 2025

CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design.

[BibT_eX]

[DOI]

CoRR, May, 2025

Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities.

[BibT_eX]

[DOI]

Ziwei Zhou

Rui Wang

Zuxuan Wu

CoRR, May, 2025

ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2025

UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, May, 2025

OmniTracker: Unifying Visual Object Tracking by Tracking-With-Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks.

[BibT_eX]

[DOI]

CoRR, April, 2025

SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL.

[BibT_eX]

[DOI]

CoRR, April, 2025

Aligning Anime Video Generation with Human Feedback.

[BibT_eX]

[DOI]

CoRR, April, 2025

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection.

[BibT_eX]

[DOI]

Proc. IEEE, March, 2025

DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation.

[BibT_eX]

[DOI]

CoRR, March, 2025

CoMP: Continual Multimodal Pre-training for Vision Foundation Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance.

[BibT_eX]

[DOI]

CoRR, March, 2025

Hydra-MDP++: Advancing End-to-End Driving via Expert-Guided Hydra-Distillation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training.

[BibT_eX]

[DOI]

CoRR, March, 2025

Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

A Survey on Video Diffusion Models.

[BibT_eX]

[DOI]

ACM Comput. Surv., February, 2025

Human2Robot: Learning Robot Actions from Paired Human-Robot Videos.

[BibT_eX]

[DOI]

CoRR, February, 2025

Safety at Scale: A Comprehensive Survey of Large Model Safety.

[BibT_eX]

[DOI]

CoRR, February, 2025

Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning.

[BibT_eX]

[DOI]

CoRR, January, 2025

FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients.

[BibT_eX]

[DOI]

CoRR, January, 2025

The Role of ViT Design and Training in Robustness to Common Corruptions.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

BMB: Balanced Memory Bank for Long-Tailed Semi-Supervised Learning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety.

[BibT_eX]

[DOI]

Found. Trends Priv. Secur., 2025

Adaptive Retention & Correction: Test-Time Training for Continual Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Advancing Dark Action Recognition via Modality Fusion and Dark-to-Light Diffusion Model.

[BibT_eX]

[DOI]

Yuxuan Wang

Zhen Xing

Zuxuan Wu

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

StableAnimator: High-Quality Identity-Preserving Human Image Animation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

AdaDiff: Adaptive Step Selection for Fast Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

FOCUS: Towards Universal Foreground Segmentation.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-from-gradients.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Adaptive Cross-Modal Transferable Adversarial Attacks From Images to Videos.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2024

Building an Open-Vocabulary Video CLIP Model With Better Architectures, Optimization and Data.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2024

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision.

[BibT_eX]

[DOI]

CoRR, 2024

REDUCIO! Generating 1024⨉1024 Video within 16 Seconds using Extremely Compressed Motion Latents.

[BibT_eX]

[DOI]

CoRR, 2024

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2024

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.

[BibT_eX]

[DOI]

CoRR, 2024

AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding.

[BibT_eX]

[DOI]

CoRR, 2024

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments.

[BibT_eX]

[DOI]

CoRR, 2024

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

Adaptive Rentention & Correction for Continual Learning.

[BibT_eX]

[DOI]

CoRR, 2024

PoseAnimate: Zero-shot high fidelity pose controllable character animation.

[BibT_eX]

[DOI]

CoRR, 2024

FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2024

MouSi: Poly-Visual-Expert Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Secrets of RLHF in Large Language Models Part II: Reward Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

GenRec: Unifying Video Generation and Recognition with Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ModelLock: Locking Your Model With a Spell.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Zero-shot High-fidelity and Pose-controllable Character Animation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

MagDiff: Multi-alignment Diffusion for High-Fidelity Video Generation and Editing.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SegIC: Unleashing the Emergent Correspondence for In-Context Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

PromptFusion: Decoupling Stability and Plasticity for Continual Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SimDA: Simple Diffusion Adapter for Efficient Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OmniViD: A Generative Framework for Universal Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MotionEditor: Editing Video Motion via Content-Aware Diffusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Learning to Rank Patches for Unbiased Image Redundancy Reduction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Cross-Domain Contrastive Learning for Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

FT-TDR: Frequency-Guided Transformer and Top-Down Refinement Network for Blind Face Inpainting.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Self-Supervised Learning for Semi-Supervised Temporal Language Grounding.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Towards Transferable Adversarial Attacks on Image and Video Transformers.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

Multimodal Pre-training Method for Vision-language Understanding and Generation.

[BibT_eX]

[DOI]

Int. J. Softw. Informatics, 2023

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2023

AdaDiff: Adaptive Step Selection for Fast Diffusion.

[BibT_eX]

[DOI]

CoRR, 2023

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Prompting Large Language Models to Reformulate Queries for Moment Localization.

[BibT_eX]

[DOI]

CoRR, 2023

BMB: Balanced Memory Bank for Imbalanced Semi-supervised Learning.

[BibT_eX]

[DOI]

CoRR, 2023

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System.

[BibT_eX]

[DOI]

CoRR, 2023

OmniTracker: Unifying Object Tracking by Tracking-with-Detection.

[BibT_eX]

[DOI]

CoRR, 2023

DiffusionAD: Denoising Diffusion for Anomaly Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.

[BibT_eX]

[DOI]

CoRR, 2023

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On the Importance of Spatial Relations for Few-shot Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SVFormer: Semi-supervised Video Transformer for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Enhancing the Self-Universality for Transferable Targeted Attacks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Look Before You Match: Instance Understanding Matters in Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ResFormer: Scaling ViTs with Multi-Resolution Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Vision Transformers are Good Mask Auto-Labelers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prototypical Residual Networks for Anomaly Detection and Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Scalable Neural Representation for Diverse Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Resolving Task Confusion in Dynamic Expansion Architectures for Class Incremental Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

SAM: Modeling Scene, Object and Action With Semantics Attention Modules for Video Recognition.

[BibT_eX]

[DOI]

Xing Zhang

Zuxuan Wu

Yu-Gang Jiang

IEEE Trans. Multim., 2022

Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

A Dynamic Frame Selection Framework for Fast Video Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

Haoran Chen

Zuxuan Wu

Yu-Gang Jiang

CoRR, 2022

Incorporating Locality of Images to Generate Targeted Transferable Adversarial Examples.

[BibT_eX]

[DOI]

CoRR, 2022

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling.

[BibT_eX]

[DOI]

CoRR, 2022

Deeper Insights into ViTs Robustness towards Common Corruptions.

[BibT_eX]

[DOI]

CoRR, 2022

M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

OmniVL: One Foundation Model for Image-Language and Video-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection.

[BibT_eX]

[DOI]

Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

Semi-supervised Single-View 3D Reconstruction via Prototype Shape Priors.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Semi-supervised Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Efficient Video Transformers with Spatial-Temporal Token Selection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Cross-Modal Transferable Adversarial Attacks from Images to Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

ObjectFormer for Image Manipulation Detection and Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

BEVT: BERT Pretraining of Video Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Robust Optimization as Data Augmentation for Large-scale Graphs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Boosting the Transferability of Video Adversarial Examples via Temporal Translation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Towards Transferable Adversarial Attacks on Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Rethinking Pseudo Labels for Semi-supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Attacking Video Recognition Models with Bullet-Screen Comments.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

A Coarse-to-Fine Framework for Resource Efficient Video Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2021

Rethinking Nearest Neighbors for Visual Classification.

[BibT_eX]

[DOI]

CoRR, 2021

Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2021

Efficient Video Transformers with Spatial-Temporal Token Selection.

[BibT_eX]

[DOI]

CoRR, 2021

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection.

[BibT_eX]

[DOI]

CoRR, 2021

HMS: Hierarchical Modality Selection for Efficient Video Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

THAT: Two Head Adversarial Training for Improving Robustness at Scale.

[BibT_eX]

[DOI]

CoRR, 2021

Encoding Robustness to Image Style via Adversarial Feature Perturbations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Multimodal Framework for Video Ads Understanding.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

VideoLT: Large-scale Long-tailed Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Exploring Visual Engagement Signals for Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Intentonomy: A Dataset and Study Towards Human Intent Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Efficient Object Embedding for Spliced Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

GTA: Global Temporal Attention for Video Action Understanding.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Deep Video Inpainting Detection.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Image and video Understanding with constrained Resources.

[BibT_eX]

[DOI]

Zuxuan Wu

PhD thesis, 2020

FLAG: Adversarial Data Augmentation for Graph Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2020

Prepare for the Worst: Generalizing across Domain Shifts with Adversarial Batch Normalization.

[BibT_eX]

[DOI]

CoRR, 2020

Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Learning From Noisy Anchors for One-Stage Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

M2KD: Incremental Learning via Multi-model and Multi-level Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the 31st British Machine Vision Conference 2020, 2020

Recognizing Instagram Filtered Images with Feature De-Stylization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Visual Content Recognition by Exploiting Semantic Feature Map with Attention and Multi-task Learning.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2019

An Analysis of Pre-Training on Object Detection.

[BibT_eX]

[DOI]

CoRR, 2019

M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Compatible and Diverse Fashion Image Inpainting.

[BibT_eX]

[DOI]

CoRR, 2019

Weakly-Supervised Spatial Context Networks.

[BibT_eX]

[DOI]

Zuxuan Wu

Larry Davis

Leonid Sigal

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

ACE: Adapting to Changing Environments for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

FiNet: Compatible and Diverse Fashion Image Inpainting.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

AdaFrame: Adaptive Frame Selection for Fast Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2018

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2018

DCAN: Dual Channel-Wise Alignment Networks for Unsupervised Scene Adaptation.

[BibT_eX]

[DOI]

Zuxuan Wu

Xintong Han

Yen-Liang Lin

Mustafa Gökhan Uzunbas

Tom Goldstein

Ser-Nam Lim

Larry S. Davis

Proceedings of the Computer Vision - ECCV 2018, 2018

BlockDrop: Dynamic Inference Paths in Residual Networks.

[BibT_eX]

[DOI]

Rogério Schmidt Feris

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

VITON: An Image-Based Virtual Try-On Network.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Deep learning for video classification and captioning.

[BibT_eX]

[DOI]

Proceedings of the Frontiers of Multimedia Research, 2018

2017

Aggregating Frame-level Features for Large-Scale Video Classification.

[BibT_eX]

[DOI]

CoRR, 2017

Learning Semantic Feature Map for Visual Content Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

LSVC2017: Large-Scale Video Classification Challenge.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

Learning Fashion Compatibility with Bidirectional LSTMs.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

Automatic Spatially-Aware Fashion Concept Discovery.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Deep Learning for Video Classification and Captioning.

[BibT_eX]

[DOI]

CoRR, 2016

Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Exploiting Objects with LSTMs for Video Categorization.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Emotion in Context: Deep Semantic Feature Fusion for Video Emotion Recognition.

[BibT_eX]

[DOI]

Chen Chen

Zuxuan Wu

Yu-Gang Jiang

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Harnessing Object and Scene Semantics for Large-Scale Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Fusing Multi-Stream Deep Networks for Video Classification.

[BibT_eX]

[DOI]

CoRR, 2015

Fudan at TRECVID 2015: Adaptive Feature Fusion for Multimedia Event Detection in Videos.

[BibT_eX]

[DOI]

Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

NTT-Fudan Team @ TRECVID 2015: Multimedia Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Evaluating Two-Stream CNN for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

2014

Fudan Team at TRECVID 2014: Multimedia Event Detection.

[BibT_eX]

[DOI]

Zuxuan Wu

Rui-Wei Zhao

Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2014

Zuxuan Wu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...