Zuxuan Wu

Orcid: 0000-0002-8689-5807

According to our database1, Zuxuan Wu authored at least 132 papers between 2014 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Adaptive Cross-Modal Transferable Adversarial Attacks From Images to Videos.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition.
ACM Trans. Multim. Comput. Commun. Appl., February, 2024

OmniVid: A Generative Framework for Universal Video Understanding.
CoRR, 2024

FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model.
CoRR, 2024

MouSi: Poly-Visual-Expert Vision-Language Models.
CoRR, 2024

Secrets of RLHF in Large Language Models Part II: Reward Modeling.
CoRR, 2024

2023
Cross-Domain Contrastive Learning for Unsupervised Domain Adaptation.
IEEE Trans. Multim., 2023

FT-TDR: Frequency-Guided Transformer and Top-Down Refinement Network for Blind Face Inpainting.
IEEE Trans. Multim., 2023

Self-Supervised Learning for Semi-Supervised Temporal Language Grounding.
IEEE Trans. Multim., 2023

Towards Transferable Adversarial Attacks on Image and Video Transformers.
IEEE Trans. Image Process., 2023

Multimodal Pre-training Method for Vision-language Understanding and Generation.
Int. J. Softw. Informatics, 2023

BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection.
CoRR, 2023

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding.
CoRR, 2023

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models.
CoRR, 2023

MotionEditor: Editing Video Motion via Content-Aware Diffusion.
CoRR, 2023

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model.
CoRR, 2023

AdaDiff: Adaptive Step Selection for Fast Diffusion.
CoRR, 2023

SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation.
CoRR, 2023

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning.
CoRR, 2023

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models.
CoRR, 2023

A Survey on Video Diffusion Models.
CoRR, 2023

Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data.
CoRR, 2023

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation.
CoRR, 2023

SimDA: Simple Diffusion Adapter for Efficient Video Generation.
CoRR, 2023

Prompting Large Language Models to Reformulate Queries for Moment Localization.
CoRR, 2023

BMB: Balanced Memory Bank for Imbalanced Semi-supervised Learning.
CoRR, 2023

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System.
CoRR, 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition.
CoRR, 2023

OmniTracker: Unifying Object Tracking by Tracking-with-Detection.
CoRR, 2023

DiffusionAD: Denoising Diffusion for Anomaly Detection.
CoRR, 2023

PromptFusion: Decoupling Stability and Plasticity for Continual Learning.
CoRR, 2023

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.
CoRR, 2023

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On the Importance of Spatial Relations for Few-shot Action Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.
Proceedings of the International Conference on Machine Learning, 2023

Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SVFormer: Semi-supervised Video Transformer for Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Enhancing the Self-Universality for Transferable Targeted Attacks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Look Before You Match: Instance Understanding Matters in Video Object Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ResFormer: Scaling ViTs with Multi-Resolution Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Vision Transformers are Good Mask Auto-Labelers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prototypical Residual Networks for Anomaly Detection and Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Scalable Neural Representation for Diverse Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Resolving Task Confusion in Dynamic Expansion Architectures for Class Incremental Learning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
SAM: Modeling Scene, Object and Action With Semantics Attention Modules for Video Recognition.
IEEE Trans. Multim., 2022

Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval.
IEEE Trans. Multim., 2022

A Dynamic Frame Selection Framework for Fast Video Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection.
CoRR, 2022

Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation.
CoRR, 2022

Incorporating Locality of Images to Generate Targeted Transferable Adversarial Examples.
CoRR, 2022

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling.
CoRR, 2022

Deeper Insights into ViTs Robustness towards Common Corruptions.
CoRR, 2022

M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

OmniVL: One Foundation Model for Image-Language and Video-Language Tasks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

Semi-supervised Single-View 3D Reconstruction via Prototype Shape Priors.
Proceedings of the Computer Vision - ECCV 2022, 2022

Semi-supervised Vision Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

Efficient Video Transformers with Spatial-Temporal Token Selection.
Proceedings of the Computer Vision - ECCV 2022, 2022

Cross-Modal Transferable Adversarial Attacks from Images to Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

ObjectFormer for Image Manipulation Detection and Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

BEVT: BERT Pretraining of Video Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Robust Optimization as Data Augmentation for Large-scale Graphs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Boosting the Transferability of Video Adversarial Examples via Temporal Translation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Towards Transferable Adversarial Attacks on Vision Transformers.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Rethinking Pseudo Labels for Semi-supervised Object Detection.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Attacking Video Recognition Models with Bullet-Screen Comments.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
A Coarse-to-Fine Framework for Resource Efficient Video Recognition.
Int. J. Comput. Vis., 2021

Rethinking Nearest Neighbors for Visual Classification.
CoRR, 2021

Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation.
CoRR, 2021

Efficient Video Transformers with Spatial-Temporal Token Selection.
CoRR, 2021

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection.
CoRR, 2021

HMS: Hierarchical Modality Selection for Efficient Video Recognition.
CoRR, 2021

THAT: Two Head Adversarial Training for Improving Robustness at Scale.
CoRR, 2021

Encoding Robustness to Image Style via Adversarial Feature Perturbations.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Multimodal Framework for Video Ads Understanding.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

VideoLT: Large-scale Long-tailed Video Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Exploring Visual Engagement Signals for Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Intentonomy: A Dataset and Study Towards Human Intent Understanding.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Efficient Object Embedding for Spliced Image Retrieval.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

GTA: Global Temporal Attention for Video Action Understanding.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Deep Video Inpainting Detection.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Image and video Understanding with constrained Resources.
PhD thesis, 2020

FLAG: Adversarial Data Augmentation for Graph Neural Networks.
CoRR, 2020

Prepare for the Worst: Generalizing across Domain Shifts with Adversarial Batch Normalization.
CoRR, 2020

Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning From Noisy Anchors for One-Stage Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

M2KD: Incremental Learning via Multi-model and Multi-level Knowledge Distillation.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

Recognizing Instagram Filtered Images with Feature De-Stylization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Visual Content Recognition by Exploiting Semantic Feature Map with Attention and Multi-task Learning.
ACM Trans. Multim. Comput. Commun. Appl., 2019

An Analysis of Pre-Training on Object Detection.
CoRR, 2019

M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning.
CoRR, 2019

Compatible and Diverse Fashion Image Inpainting.
CoRR, 2019

Weakly-Supervised Spatial Context Networks.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation.
Proceedings of the 7th International Conference on Learning Representations, 2019

ACE: Adapting to Changing Environments for Semantic Segmentation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

FiNet: Compatible and Diverse Fashion Image Inpainting.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

AdaFrame: Adaptive Frame Selection for Fast Video Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification.
IEEE Trans. Multim., 2018

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

DCAN: Dual Channel-Wise Alignment Networks for Unsupervised Scene Adaptation.
Proceedings of the Computer Vision - ECCV 2018, 2018

BlockDrop: Dynamic Inference Paths in Residual Networks.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

VITON: An Image-Based Virtual Try-On Network.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Deep learning for video classification and captioning.
Proceedings of the Frontiers of Multimedia Research, 2018

2017
Aggregating Frame-level Features for Large-Scale Video Classification.
CoRR, 2017

Learning Semantic Feature Map for Visual Content Recognition.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

LSVC2017: Large-Scale Video Classification Challenge.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Learning Fashion Compatibility with Bidirectional LSTMs.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Automatic Spatially-Aware Fashion Concept Discovery.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Deep Learning for Video Classification and Captioning.
CoRR, 2016

Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Exploiting Objects with LSTMs for Video Categorization.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Emotion in Context: Deep Semantic Feature Fusion for Video Emotion Recognition.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Harnessing Object and Scene Semantics for Large-Scale Video Understanding.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Fusing Multi-Stream Deep Networks for Video Classification.
CoRR, 2015

Fudan at TRECVID 2015: Adaptive Feature Fusion for Multimedia Event Detection in Videos.
Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

NTT-Fudan Team @ TRECVID 2015: Multimedia Event Detection.
Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Evaluating Two-Stream CNN for Video Classification.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

2014
Fudan Team at TRECVID 2014: Multimedia Event Detection.
Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2014


  Loading...