Yingwei Pan

Orcid: 0000-0002-4344-8898

According to our database¹, Yingwei Pan authored at least 121 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer.

[BibT_eX]

[DOI]

CoRR, May, 2026

DreamVAR: Taming Reinforced Visual Autoregressive Model for High-Fidelity Subject-Driven Image Generation.

[BibT_eX]

[DOI]

CoRR, January, 2026

DreamJourney: Perpetual View Generation With Video Diffusion Models.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2026

FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Creatively Upscaling Images with Global-Regional Priors.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., August, 2025

Visual Autoregressive Modeling for Instruction-Guided Image Editing.

[BibT_eX]

[DOI]

CoRR, August, 2025

Kernel Masked Image Modeling Through the Lens of Theoretical Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., July, 2025

HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, May, 2025

Exploring Vision-Language Foundation Model for Novel Object Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., January, 2025

Stream-ViT: Learning Streamlined Convolutions in Vision Transformer.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

VTON-VLLM: Aligning Virtual Try-On Models with Human Preferences.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Identity-Preserving Video Generation Challenge.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Talk, Imagine, Evolve: A Unified Multimodal Agent for Seamless Visual Generation and Editing.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Edit-by-Example: Adaptive Exemplar-Based Image Editing.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

HiDream-I1: An Open-Source High-Efficient Image Generative Foundation Model.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Denoising Token Prediction in Masked Autoregressive Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MotionPro: A Precise Motion Controller for Image-to-Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

HIRI-ViT: Scaling Vision Transformer With High Resolution Inputs.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

End-to-End Video Scene Graph Generation With Temporal Propagation Transformer.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Improving Virtual Try-On with Garment-Focused Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Improving Text-Guided Object Inpainting with Semantic Pre-inpainting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SD-DiT: Unleashing the Power of Self-Supervised Discrimination in Diffusion Transformer<sup>*</sup>.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Boosting Diffusion Models with Moving Average Sampling in Frequency Domain.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Prompt Refinement with Image Pivot for Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Dual Vision Transformer.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2023

A Low Rank Promoting Prior for Unsupervised Contrastive Learning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2023

Boosting Scene Graph Generation with Visual Relation Saliency.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., January, 2023

Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., January, 2023

Bottom-up and Top-down Object Inference Networks for Image Captioning.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2023

Boosting Relationship Detection in Images with Multi-Granular Self-Supervised Learning.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2023

Contextual Transformer Networks for Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2023

3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Control3D: Towards Controllable Text-to-3D Generation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

3D Creation at Your Fingertips: From Text or Image to 3D Assets.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Learning Neural Implicit Surfaces with Object-Aware Radiance Fields.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

HGNet: Learning Hierarchical Geometry from Points, Edges, and Surfaces.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Modality-Agnostic Debiasing for Single Domain Generalization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantic-Conditional Diffusion Networks for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning to Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2022

Unpaired Image Captioning With semantic-Constrained Self-Learning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

3D Cascade RCNN: High Quality Object Detection in Point Clouds.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization.

[BibT_eX]

[DOI]

CoRR, 2022

Dual Vision Transformer.

[BibT_eX]

[DOI]

CoRR, 2022

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation.

[BibT_eX]

[DOI]

CoRR, 2022

Contextual and selective attention networks for image captioning.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2022

Out-of-Distribution Detection via Conditional Kernel Independence Model.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Dynamic Temporal Filtering in Video Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Stand-Alone Inter-Frame Attention in Video Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Comprehending and Ordering Semantics for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

3D-Producer: A Hybrid and User-Friendly 3D Reconstruction System.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence - Second CAAI International Conference, 2022

2021

Smart Director: An Event-Driven Directing System for Live Broadcasting.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2021

Single Shot Video Object Detector.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2021

MINet: Meta-Learning Instance Identifiers for Video Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

A Style and Semantic Memory Mechanism for Domain Generalization.

[BibT_eX]

[DOI]

CoRR, 2021

A Low Rank Promoting Prior for Unsupervised Contrastive Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Transferrable Contrastive Learning for Visual Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

A Style and Semantic Memory Mechanism for Domain Generalization<sup>*</sup>.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Representing Videos As Discriminative Sub-Graphs for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Deep Metric Learning With Density Adaptivity.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

Pre-training for Video Captioning Challenge 2020 Summary.

[BibT_eX]

[DOI]

CoRR, 2020

Joint Contrastive Learning with Infinite Possibilities.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

iDirector: An Intelligent Directing System for Live Broadcast.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Exploring Depth Information for Spatial Relation Recognition.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE Conference on Multimedia Information Processing and Retrieval, 2020

Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

X-Linear Attention Networks for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning a Unified Sample Weighting Network for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Learning Click-Based Deep Structure-Preserving Embeddings with Visual Attention.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2019

Multi-Source Domain Adaptation and Semi-Supervised Domain Adaptation with Focus on Visual Domain Adaptation Challenge 2019.

[BibT_eX]

[DOI]

CoRR, 2019

vireoJD-MM at Activity Detection in Extended Videos.

[BibT_eX]

[DOI]

CoRR, 2019

Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019.

[BibT_eX]

[DOI]

CoRR, 2019

VireoJD-MM @ TRECVid 2019: Activities in Extended Video (ActEV).

[BibT_eX]

[DOI]

Proceedings of the 2019 TREC Video Retrieval Evaluation, 2019

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Animating Your Life: Real-Time Video-to-Animation Translation.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Mocycle-GAN: Unpaired Video-to-Video Translation.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Hierarchy Parsing for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Relation Distillation Networks for Video Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Transferrable Prototypical Networks for Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Pointing Novel Objects in Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Exploring Object Relation in Mean Teacher for Cross-Domain Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Exploring Visual Relationship for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Jointly Localizing and Describing Events for Dense Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Memory Matching Networks for One-Shot Image Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Deep Semantic Hashing with Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Seeing Bot.

[BibT_eX]

[DOI]

Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

To Create What You Tell: Generating Videos from Captions.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

Boosting Image Captioning with Attributes.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Video Captioning with Transferred Semantic Attributes.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Jointly Modeling Embedding and Translation to Bridge Video and Language.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search.

[BibT_eX]

[DOI]

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015

Semi-supervised Domain Adaptation with Subspace Learning for visual recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014

Click-through-based cross-view learning for image search.

[BibT_eX]

[DOI]

Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Click-through-based Subspace Learning for Image Search.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

2013

Image search by graph-based label propagation with image representation from DNN.

[BibT_eX]

[DOI]

Proceedings of the ACM Multimedia Conference, 2013

Yingwei Pan

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...