Zuxuan Wu
Orcid: 0000-0002-8689-5807
According to our database1,
Zuxuan Wu
authored at least 200 papers
between 2014 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
IEEE Trans. Pattern Anal. Mach. Intell., August, 2025
CoRR, August, 2025
CoRR, August, 2025
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models.
CoRR, August, 2025
CoRR, July, 2025
StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation.
CoRR, July, 2025
FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization.
CoRR, July, 2025
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis.
CoRR, July, 2025
CoRR, June, 2025
CoRR, June, 2025
Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control.
CoRR, June, 2025
CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design.
CoRR, May, 2025
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities.
CoRR, May, 2025
ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning.
CoRR, May, 2025
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation.
CoRR, May, 2025
IEEE Trans. Pattern Anal. Mach. Intell., April, 2025
CoRR, April, 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL.
CoRR, April, 2025
Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection.
Proc. IEEE, March, 2025
DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation.
CoRR, March, 2025
CoRR, March, 2025
CoRR, March, 2025
CoRR, March, 2025
Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning.
CoRR, March, 2025
CoRR, February, 2025
CoRR, January, 2025
FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients.
CoRR, January, 2025
IEEE Trans. Multim., 2025
IEEE Trans. Multim., 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Advancing Dark Action Recognition via Modality Fusion and Dark-to-Light Diffusion Model.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-from-gradients.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition.
ACM Trans. Multim. Comput. Commun. Appl., February, 2024
Building an Open-Vocabulary Video CLIP Model With Better Architectures, Optimization and Data.
IEEE Trans. Pattern Anal. Mach. Intell., 2024
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks.
CoRR, 2024
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation.
CoRR, 2024
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning.
CoRR, 2024
ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection.
CoRR, 2024
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision.
CoRR, 2024
REDUCIO! Generating 1024⨉1024 Video within 16 Seconds using Extremely Compressed Motion Latents.
CoRR, 2024
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders.
CoRR, 2024
Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers.
CoRR, 2024
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model.
CoRR, 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2023
IEEE Trans. Multim., 2023
FT-TDR: Frequency-Guided Transformer and Top-Down Refinement Network for Blind Face Inpainting.
IEEE Trans. Multim., 2023
IEEE Trans. Multim., 2023
IEEE Trans. Image Process., 2023
Int. J. Softw. Informatics, 2023
CoRR, 2023
VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model.
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.
CoRR, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Resolving Task Confusion in Dynamic Expansion Architectures for Class Incremental Learning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
SAM: Modeling Scene, Object and Action With Semantics Attention Modules for Video Recognition.
IEEE Trans. Multim., 2022
IEEE Trans. Multim., 2022
IEEE Trans. Pattern Anal. Mach. Intell., 2022
Incorporating Locality of Images to Generate Targeted Transferable Adversarial Examples.
CoRR, 2022
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling.
CoRR, 2022
M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
Int. J. Comput. Vis., 2021
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation.
CoRR, 2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Proceedings of the 32nd British Machine Vision Conference 2021, 2021
Proceedings of the 32nd British Machine Vision Conference 2021, 2021
2020
Prepare for the Worst: Generalizing across Domain Shifts with Adversarial Batch Normalization.
CoRR, 2020
Proceedings of the Computer Vision - ECCV 2020, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Proceedings of the 31st British Machine Vision Conference 2020, 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Visual Content Recognition by Exploiting Semantic Feature Map with Attention and Multi-task Learning.
ACM Trans. Multim. Comput. Commun. Appl., 2019
CoRR, 2019
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Proceedings of the 7th International Conference on Learning Representations, 2019
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2018
Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification.
IEEE Trans. Multim., 2018
Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2018
Proceedings of the Computer Vision - ECCV 2018, 2018
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
Proceedings of the Frontiers of Multimedia Research, 2018
2017
Proceedings of the 2017 ACM on Multimedia Conference, 2017
Proceedings of the 2017 ACM on Multimedia Conference, 2017
Proceedings of the 2017 ACM on Multimedia Conference, 2017
Proceedings of the IEEE International Conference on Computer Vision, 2017
2016
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016
2015
Fudan at TRECVID 2015: Adaptive Feature Fusion for Multimedia Event Detection in Videos.
Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015
Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015
Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015
Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015
2014
Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014
Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014
Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2014