Yuankai Qi

Orcid: 0000-0003-4312-5682

According to our database1, Yuankai Qi authored at least 96 papers between 2013 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2026
RETTA: Retrieval-enhanced test-time adaptation for zero-shot video captioning.
Pattern Recognit., 2026

Dynamic example network for class-agnostic object counting.
Pattern Recognit., 2026

2025
Tracking the Unstable: Appearance-Guided Motion Modeling for Robust Multi-Object Tracking in UAV-Captured Videos.
CoRR, August, 2025

XTransfer: Cross-Modality Model Transfer for Human Sensing with Few Data at the Edge.
CoRR, June, 2025

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding.
CoRR, May, 2025

Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models.
CoRR, May, 2025

FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing.
CoRR, May, 2025

SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting.
CoRR, April, 2025

ProgRoCC: A Progressive Approach to Rough Crowd Counting.
CoRR, April, 2025

The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning.
CoRR, March, 2025

Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization.
CoRR, March, 2025

Dual Prototype Contrastive Network for Generalized Zero-Shot Learning.
IEEE Trans. Circuits Syst. Video Technol., February, 2025

Spatial-Temporal Interleaved Network for Efficient Action Recognition.
IEEE Trans. Ind. Informatics, January, 2025

Presentation Attack Detection: A Systematic Literature Review.
ACM Comput. Surv., January, 2025

Exploring Primitive Visual Measurement Understanding and the Role of Output Format in Learning in Vision-Language Models.
CoRR, January, 2025

Boosting UAV Detection via Memory-Enhanced Attention and Contrastive Learning.
IEEE Signal Process. Lett., 2025

Trffc: Efficient Traffic Forecasting through Adaptive Spatio-Temporal Graph Reduction.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025

Hierarchical Prompt-Guided Alignment for Multi-view Clustering.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2025

Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Medusa: A Multi-Scale High-order Contrastive Dual-Diffusion Approach for Multi-View Clustering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Generating Synthetic Data for Unsupervised Federated Learning of Cross-Modal Retrieval.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Incomplete Multi-View Multi-Label Classification via Diffusion-Guided Redundancy Removal.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Progressive Multi-Resolution Loss for Crowd Counting.
IEEE Trans. Circuits Syst. Video Technol., May, 2024

A Unified Object Counting Network With Object Occupation Prior.
IEEE Trans. Circuits Syst. Video Technol., February, 2024

Learning Hierarchical Modular Networks for Video Captioning.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

Rethinking Attentive Object Detection via Neural Attention Learning.
IEEE Trans. Image Process., 2024

Rethink video retrieval representation for video captioning.
Pattern Recognit., 2024

Style-aware two-stage learning framework for video captioning.
Knowl. Based Syst., 2024

Adapter-Enhanced Semantic Prompting for Continual Learning.
CoRR, 2024

Retrieval Enhanced Zero-Shot Video Captioning.
CoRR, 2024

From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2024, 2024

Generating High-Quality Symbolic Music Using Fine-Grained Discriminators.
Proceedings of the Pattern Recognition - 27th International Conference, 2024

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-Training Framework.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Weakly Supervised Video Individual Counting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Generating Content for HDR Deghosting from Frequency View.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Augmented Commonsense Knowledge for Remote Object Grounding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Subject-Oriented Video Captioning.
CoRR, 2023

Weakly Supervised Video Individual CountingWeakly Supervised Video Individual Counting.
CoRR, 2023

Dynamic Erasing Network Based on Multi-Scale Temporal Features for Weakly Supervised Video Anomaly Detection.
CoRR, 2023

AerialVLN: Vision-and-Language Navigation for UAVs.
CoRR, 2023

Teacher Agent: A Non-Knowledge Distillation Method for Rehearsal-based Video Incremental Learning.
CoRR, 2023

CALM: An Enhanced Encoding and Confidence Evaluating Framework for Trustworthy Multi-view Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Mind the Gap: Improving Success Rate of Vision-and-Language Navigation by Revisiting Oracle Success Routes.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

March in Chat: Interactive Prompting for Remote Embodied Referring Expression.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

AerialVLN: Vision-and-Language Navigation for UAVs.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Generative Approach for Comprehensive Financial Event Extraction at the Document Level.
Proceedings of the 4th ACM International Conference on AI in Finance, 2023

Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning to Dub Movies via Hierarchical Prosody Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
BEVBert: Topo-Metric Map Pre-training for Language-guided Navigation.
CoRR, 2022

Consistency-Aware Anchor Pyramid Network for Crowd Localization.
CoRR, 2022

HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation.
CoRR, 2022

Diagnosing Vision-and-Language Navigation: What Really Matters.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Multi-Attention Network for Compressed Video Referring Object Segmentation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Hierarchical Modular Network for Video Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

V2C: Visual Voice Cloning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
D3D: Dual 3-D Convolutional Network for Real-Time Action Recognition.
IEEE Trans. Ind. Informatics, 2021

Light fixed-time control for cluster synchronization of complex networks.
Neurocomputing, 2021

Image editing with varying intensities of processing.
Comput. Vis. Image Underst., 2021

Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.
CoRR, 2021

R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Neighbor-view Enhanced Model for Vision and Language Navigation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

VLN BERT: A Recurrent Vision-and-Language BERT for Navigation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Siamese Local and Global Networks for Robust Face Tracking.
IEEE Trans. Image Process., 2020

EventDTW: An Improved Dynamic Time Warping Algorithm for Aligning Biomedical Signals of Nonuniform Sampling Frequencies.
Sensors, 2020

A Recurrent Vision-and-Language BERT for Navigation.
CoRR, 2020

Language and Visual Entity Relationship Graph for Agent Navigation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Object-and-Action Aware Model for Visual Language Navigation.
Proceedings of the Computer Vision - ECCV 2020, 2020

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Overwater Image Dehazing via Cycle-Consistent Generative Adversarial Network.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

Release the Power of Online-Training for Robust Visual Tracking.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Hedging Deep Features for Visual Tracking.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Robust visual tracking via scale-and-state-awareness.
Neurocomputing, 2019

RERERE: Remote Embodied Referring Expressions in Real indoor Environments.
CoRR, 2019

High Performance Gesture Recognition via Effective and Efficient Temporal Modeling.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Learning Attribute-Specific Representations for Visual Tracking.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Point-to-Set Distance Metric Learning on Deep Representations for Visual Tracking.
IEEE Trans. Intell. Transp. Syst., 2018

Structure-Aware Local Sparse Coding for Visual Tracking.
IEEE Trans. Image Process., 2018

BoMW: Bag of Manifold Words for One-Shot Learning Gesture Recognition From Kinect.
IEEE Trans. Circuits Syst. Video Technol., 2018

Plant identification based on very deep convolutional neural networks.
Multim. Tools Appl., 2018


The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking.
Proceedings of the Computer Vision - ECCV 2018, 2018

2017
Robust Visual Tracking via Basis Matching.
IEEE Trans. Circuits Syst. Video Technol., 2017

Video Object Segmentation with Re-identification.
CoRR, 2017

2016
The Visual Object Tracking VOT2016 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

Hedged Deep Tracking.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2014
Structure-aware multi-object discovery for weakly supervised tracking.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014


2013
3D Segmentation of the Lung Based on the Neighbor Information and Curvature.
Proceedings of the Seventh International Conference on Image and Graphics, 2013


  Loading...