Yuankai Qi

Orcid: 0000-0003-4312-5682

According to our database¹, Yuankai Qi authored at least 124 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2026

Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-Based Video Incremental Learning.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2026

CoSyncDiT: Cognitive Synchronous Diffusion Transformer for Movie Dubbing.

[BibT_eX]

[DOI]

CoRR, April, 2026

DGRNet: Disagreement-Guided Refinement for Uncertainty-Aware Brain Tumor Segmentation.

[BibT_eX]

[DOI]

CoRR, March, 2026

Hierarchical Text-Guided Brain Tumor Segmentation via Sub-Region-Aware Prompts.

[BibT_eX]

[DOI]

Bahram Mohammadi

Ta Duc Huy

Afrouz Sheikholeslami

CoRR, March, 2026

Question-guided Visual Compression with Memory Feedback for Long-Term Video Understanding.

[BibT_eX]

[DOI]

CoRR, March, 2026

Indoor Scene Recognition in Vision-and-Language Navigation.

[BibT_eX]

[DOI]

IEEE Trans. Consumer Electron., February, 2026

Exploring the Temporal Consistency for Point-Level Weakly-Supervised Temporal Action Localization.

[BibT_eX]

[DOI]

CoRR, February, 2026

Unlocking Prototype Potential: An Efficient Tuning Framework for Few-Shot Class-Incremental Learning.

[BibT_eX]

[DOI]

CoRR, February, 2026

Boosting Point-supervised Temporal Action Localization via Text Refinement and Alignment.

[BibT_eX]

[DOI]

CoRR, February, 2026

Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification.

[BibT_eX]

[DOI]

CoRR, January, 2026

Visual Marker Search for Autonomous Drone Landing in Diverse Urban Environments.

[BibT_eX]

[DOI]

CoRR, January, 2026

Ghost-Free HDR Imaging via Latent Low-Frequency Priors and Deformable Attention Alignment.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2026

Link prediction on multi-relational graphs from an influence propagation perspective.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Parameter-efficient action planning with large language models for vision-and-language navigation.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

RETTA: Retrieval-enhanced test-time adaptation for zero-shot video captioning.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Dynamic example network for class-agnostic object counting.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

ImVision: Adapting Pretrained Vision Models for Time-Series Imputation.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the ACM Web Conference 2026, 2026

The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

InstructDubber: Instruction-based Alignment for Zero-shot Movie Dubbing.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Tracking the Unstable: Appearance-Guided Motion Modeling for Robust Multi-Object Tracking in UAV-Captured Videos.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

A Survey on Improving Human Robot Collaboration through Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Virendra Singh Shekhawat

CoRR, December, 2025

Dubbing Movies via Hierarchical Phoneme Modeling and Acoustic Diffusion Denoising.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation.

[BibT_eX]

[DOI]

Yihao Zhang

Yuankai Qi

Xi Zheng

CoRR, November, 2025

Dynamic Erasing Network With Adaptive Temporal Modeling for Weakly Supervised Video Anomaly Detection.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., September, 2025

XTransfer: Cross-Modality Model Transfer for Human Sensing with Few Data at the Edge.

[BibT_eX]

[DOI]

CoRR, June, 2025

Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

ProgRoCC: A Progressive Approach to Rough Crowd Counting.

[BibT_eX]

[DOI]

CoRR, April, 2025

The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization.

[BibT_eX]

[DOI]

CoRR, March, 2025

Dual Prototype Contrastive Network for Generalized Zero-Shot Learning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., February, 2025

Spatial-Temporal Interleaved Network for Efficient Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Ind. Informatics, January, 2025

Presentation Attack Detection: A Systematic Literature Review.

[BibT_eX]

[DOI]

ACM Comput. Surv., January, 2025

Self-Reflection Neural Network for Class-Incremental Object Counting.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

Input-Regulated Remote Sensing Counting With Region Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Geosci. Remote. Sens., 2025

Boosting UAV Detection via Memory-Enhanced Attention and Contrastive Learning.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

Trffc: Efficient Traffic Forecasting through Adaptive Spatio-Temporal Graph Reduction.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025

SDVPT: Semantic-Driven Visual Prompt Tuning for Open-world Object Counting.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

CausalMVC: Causal Content-Style Representation Learning for Deep Multi-View Clustering.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Language-Conditioned Waypoint Predictor for Continuous Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Hierarchical Prompt-Guided Alignment for Multi-view Clustering.

[BibT_eX]

[DOI]

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2025

Learning from Uncertainty: A Cloud-Based Active Learning for Detecting Bone Union in Mandibular Reconstruction.

[BibT_eX]

[DOI]

MohammadHossein Ahmadi

Amin Beheshti

Yuankai Qi

Sam Mokhtari

Maryam Khanian Najafabadi

Masako Dunn

Jonathan R. Clark

Proceedings of the IEEE International Conference on Data Mining, 2025

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Exploring Primitive Visual Measurement Understanding and the Role of Output Format in Learning in Vision-Language Models.

[BibT_eX]

[DOI]

Ankit Yadav

Lingqiao Liu

Yuankai Qi

Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, 2025

Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Medusa: A Multi-Scale High-order Contrastive Dual-Diffusion Approach for Multi-View Clustering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

STGS: Spatio-temporal Graph Sparsification Using Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025

Generating Synthetic Data for Unsupervised Federated Learning of Cross-Modal Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Incomplete Multi-View Multi-Label Classification via Diffusion-Guided Redundancy Removal.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Progressive Multi-Resolution Loss for Crowd Counting.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., May, 2024

A Unified Object Counting Network With Object Occupation Prior.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., February, 2024

Learning Hierarchical Modular Networks for Video Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

Rethinking Attentive Object Detection via Neural Attention Learning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

Rethink video retrieval representation for video captioning.

[BibT_eX]

[DOI]

Pattern Recognit., 2024

Style-aware two-stage learning framework for video captioning.

[BibT_eX]

[DOI]

Knowl. Based Syst., 2024

Adapter-Enhanced Semantic Prompting for Continual Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Retrieval Enhanced Zero-Shot Video Captioning.

[BibT_eX]

[DOI]

CoRR, 2024

From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2024, 2024

Generating High-Quality Symbolic Music Using Fine-Grained Discriminators.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition - 27th International Conference, 2024

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-Training Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Weakly Supervised Video Individual Counting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Generating Content for HDR Deghosting from Frequency View.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Augmented Commonsense Knowledge for Remote Object Grounding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Subject-Oriented Video Captioning.

[BibT_eX]

[DOI]

CoRR, 2023

Weakly Supervised Video Individual CountingWeakly Supervised Video Individual Counting.

[BibT_eX]

[DOI]

CoRR, 2023

Dynamic Erasing Network Based on Multi-Scale Temporal Features for Weakly Supervised Video Anomaly Detection.

[BibT_eX]

[DOI]

CoRR, 2023

AerialVLN: Vision-and-Language Navigation for UAVs.

[BibT_eX]

[DOI]

CoRR, 2023

Teacher Agent: A Non-Knowledge Distillation Method for Rehearsal-based Video Incremental Learning.

[BibT_eX]

[DOI]

CoRR, 2023

CALM: An Enhanced Encoding and Confidence Evaluating Framework for Trustworthy Multi-view Learning.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Mind the Gap: Improving Success Rate of Vision-and-Language Navigation by Revisiting Oracle Success Routes.

[BibT_eX]

[DOI]

Chongyang Zhao

Yuankai Qi

Qi Wu

Proceedings of the 31st ACM International Conference on Multimedia, 2023

March in Chat: Interactive Prompting for Remote Embodied Referring Expression.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

AerialVLN: Vision-and-Language Navigation for UAVs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Generative Approach for Comprehensive Financial Event Extraction at the Document Level.

[BibT_eX]

[DOI]

Proceedings of the 4th ACM International Conference on AI in Finance, 2023

Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning to Dub Movies via Hierarchical Prosody Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

BEVBert: Topo-Metric Map Pre-training for Language-guided Navigation.

[BibT_eX]

[DOI]

CoRR, 2022

Consistency-Aware Anchor Pyramid Network for Crowd Localization.

[BibT_eX]

[DOI]

CoRR, 2022

HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

CoRR, 2022

Diagnosing Vision-and-Language Navigation: What Really Matters.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Multi-Attention Network for Compressed Video Referring Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Hierarchical Modular Network for Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

V2C: Visual Voice Cloning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

D3D: Dual 3-D Convolutional Network for Real-Time Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Ind. Informatics, 2021

Light fixed-time control for cluster synchronization of complex networks.

[BibT_eX]

[DOI]

Neurocomputing, 2021

Image editing with varying intensities of processing.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2021

Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.

[BibT_eX]

[DOI]

CoRR, 2021

R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Neighbor-view Enhanced Model for Vision and Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

VLN BERT: A Recurrent Vision-and-Language BERT for Navigation.

[BibT_eX]

[DOI]

Yicong Hong

Qi Wu

Yuankai Qi

Cristian Rodriguez Opazo

Stephen Gould

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Siamese Local and Global Networks for Robust Face Tracking.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

EventDTW: An Improved Dynamic Time Warping Algorithm for Aligning Biomedical Signals of Nonuniform Sampling Frequencies.

[BibT_eX]

[DOI]

Sensors, 2020

A Recurrent Vision-and-Language BERT for Navigation.

[BibT_eX]

[DOI]

Yicong Hong

Qi Wu

Yuankai Qi

Cristian Rodriguez Opazo

Stephen Gould

CoRR, 2020

Language and Visual Entity Relationship Graph for Agent Navigation.

[BibT_eX]

[DOI]

Yicong Hong

Cristian Rodriguez Opazo

Yuankai Qi

Qi Wu

Stephen Gould

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Object-and-Action Aware Model for Visual Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Overwater Image Dehazing via Cycle-Consistent Generative Adversarial Network.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

Release the Power of Online-Training for Robust Visual Tracking.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Hedging Deep Features for Visual Tracking.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2019

Robust visual tracking via scale-and-state-awareness.

[BibT_eX]

[DOI]

Neurocomputing, 2019

RERERE: Remote Embodied Referring Expressions in Real indoor Environments.

[BibT_eX]

[DOI]

CoRR, 2019

High Performance Gesture Recognition via Effective and Efficient Temporal Modeling.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Learning Attribute-Specific Representations for Visual Tracking.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Point-to-Set Distance Metric Learning on Deep Representations for Visual Tracking.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Transp. Syst., 2018

Structure-Aware Local Sparse Coding for Visual Tracking.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2018

BoMW: Bag of Manifold Words for One-Shot Learning Gesture Recognition From Kinect.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2018

Plant identification based on very deep convolutional neural networks.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2018

VisDrone-SOT2018: The Vision Meets Drone Single-Object Tracking Challenge Results.

[BibT_eX]

[DOI]

Konstantinos Avgerinakis

Kyuewang Lee

Lu Ding

Martin Lauer

Panagiotis Giannakeris

Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

2017

Robust Visual Tracking via Basis Matching.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2017

Video Object Segmentation with Re-identification.

[BibT_eX]

[DOI]

CoRR, 2017

2016

The Visual Object Tracking VOT2016 Challenge Results.

[BibT_eX]

[DOI]

Alireza Memarmoghadam

Gorthi R. K. Sai Subrahmanyam

Guilherme Sousa Bastos

Kannappan Palaniappan

Mario Edoardo Maresca

Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

Hedged Deep Tracking.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2014

Structure-aware multi-object discovery for weakly supervised tracking.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

The Visual Object Tracking VOT2014 Challenge Results.

[BibT_eX]

[DOI]

Mario Edoardo Maresca

Proceedings of the Computer Vision - ECCV 2014 Workshops, 2014

2013

3D Segmentation of the Lung Based on the Neighbor Information and Curvature.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Conference on Image and Graphics, 2013

Yuankai Qi

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...