Zongxin Yang

ACM Trans. Multim. Comput. Commun. Appl., February, 2026

Toward General-Purpose Video Reconstruction Through Synergy of Grid-Splicing Diffusion and Large Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., February, 2026

Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models.

[BibT_eX]

[DOI]

CoRR, February, 2026

Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction.

[BibT_eX]

[DOI]

CoRR, February, 2026

A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies.

[BibT_eX]

[DOI]

npj Digit. Medicine, 2026

2025

TraceFlow: Dynamic 3D Reconstruction of Specular Scenes Driven by Ray Tracing.

[BibT_eX]

[DOI]

CoRR, December, 2025

Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models.

[BibT_eX]

[DOI]

CoRR, November, 2025

Are Image-to-Video Models Good Zero-Shot Image Editors?

[BibT_eX]

[DOI]

CoRR, November, 2025

BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment.

[BibT_eX]

[DOI]

CoRR, November, 2025

GD-NeRF: Generative Detail Compensation for One-shot Generalizable Neural Radiance Fields.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., October, 2025

A Weakly Supervised Transformer to Support Rare Disease Diagnosis from Electronic Health Records: Methods and Applications in Rare Pulmonary Disease.

[BibT_eX]

[DOI]

CoRR, July, 2025

Test-Time Adaptation for Real-World Video Adverse Weather Restoration With Meta Batch Normalization.

[BibT_eX]

[DOI]

Jinliang Liu

IEEE Trans. Circuits Syst. Video Technol., June, 2025

SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis.

[BibT_eX]

[DOI]

CoRR, June, 2025

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, April, 2025

MedSAM2: Segment Anything in 3D Medical Images and Videos.

[BibT_eX]

[DOI]

CoRR, April, 2025

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

X-Field: A Physically Grounded Representation for 3D X-ray Reconstruction.

[BibT_eX]

[DOI]

CoRR, March, 2025

3D Object Manipulation in a Single Image using Generative Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering.

[BibT_eX]

[DOI]

CoRR, January, 2025

Generalizable Origin Identification for Text-Guided Image-to-Image Diffusion Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

Exploiting EfficientSAM and Temporal Coherence for Audio-Visual Segmentation.

[BibT_eX]

[DOI]

Yue Zhu

Kun Li

IEEE Trans. Multim., 2025

Prompt-based multimodal representation learning for drug repurposing.

[BibT_eX]

[DOI]

Briefings Bioinform., 2025

Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

X-Field: A Physically Informed Representation for 3D X-ray Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Origin Identification for Text-Guided Image-to-Image Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Few-Shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

IDPro: Flexible Interactive Video Object Segmentation by ID-Queried Concurrent Propagation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., December, 2024

Noise-Tolerant Hybrid Prototypical Learning with Noisy Web Data.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., October, 2024

Scalable Video Object Segmentation With Identification Mechanism.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

High Fidelity Makeup via 2D and 3D Identity Preservation Net.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., August, 2024

MuscleParseNet: A Novel Framework for Parsing Muscles of Drosophila Larva in Light-Sheet Fluorescence Microscopy Images.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., June, 2024

Show Me a Video: A Large-Scale Narrated Video Dataset for Coherent Story Illustration.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Collaborative Hybrid Propagator for Temporal Misalignment in Audio-Visual Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Training of Large Vision Models via Advanced Automated Progressive Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Replication in Visual Diffusion Models: A Survey and Outlook.

[BibT_eX]

[DOI]

CoRR, 2024

Explore Synergistic Interaction Across Frames for Interactive Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields.

[BibT_eX]

[DOI]

CoRR, 2024

DRIP: Unleashing Diffusion Priors for Joint Foreground and Alpha Prediction in Image Matting.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent).

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction.

[BibT_eX]

[DOI]

Zechuan Zhang

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Controllable 3D Face Generation with Conditional Style Code Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Co-Learning Meets Stitch-Up for Noisy Multi-Label Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

Collaborative Content-Dependent Modeling: A Return to the Roots of Salient Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

Human101: Training 100+FPS Human Gaussians in 100s from 1 View.

[BibT_eX]

[DOI]

CoRR, 2023

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance.

[BibT_eX]

[DOI]

CoRR, 2023

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction.

[BibT_eX]

[DOI]

CoRR, 2023

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking.

[BibT_eX]

[DOI]

CoRR, 2023

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Segment and Track Anything.

[BibT_eX]

[DOI]

CoRR, 2023

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Pyramid Diffusion Models for Low-light Image Enhancement.

[BibT_eX]

[DOI]

Dewei Zhou

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Video Object Segmentation in Panoptic Wild Scenes.

[BibT_eX]

[DOI]

Norbert Scherer-Negenborn

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Decompose to Generalize: Species-Generalized Animal Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.

[BibT_eX]

[DOI]

Kannappan Palaniappan

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Shuffled Autoregression for Motion Interpolation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ProD: Prompting-to-disentangle Domain Knowledge for Cross-domain Few-shot Image Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration.

[BibT_eX]

[DOI]

Yunchao Wei

IEEE Trans. Pattern Anal. Mach. Intell., 2022

V<sup>2</sup>L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval.

[BibT_eX]

[DOI]

CoRR, 2022

Associating Objects with Scalable Transformers for Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2022

Decoupling Features in Hierarchical Propagation for Video Object Segmentation.

[BibT_eX]

[DOI]

Rama Krishna Sai S. Gorthi

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Instance as Identity: A Generic Online Paradigm for Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

The Tenth Visual Object Tracking VOT2022 Challenge Results.

[BibT_eX]

[DOI]

Joni-Kristian Kämäräinen

Alireza Memarmoghadam

Christian Micheloni

Payman Moallem

Le Thanh Nguyen-Meidine

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Sequence Modelling with Deep Learning for Visual Content Generation and Understanding

[BibT_eX]

[DOI]

PhD thesis, 2021

Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2021

Associating Objects with Transformers for Video Object Segmentation.

[BibT_eX]

[DOI]

Yunchao Wei

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-Scale Consistency.

[BibT_eX]

[DOI]

Xin Yu

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Collaborative Video Object Segmentation by Foreground-Background Integration.

[BibT_eX]

[DOI]

Yunchao Wei