Yuqian Fu

Orcid: 0000-0002-0412-5500

According to our database1, Yuqian Fu authored at least 89 papers between 2019 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
SeGDP: Source-free Cross-domain Few-shot Learning via Semantic Guided Diversity Prompting.
ACM Trans. Multim. Comput. Commun. Appl., May, 2026

Are Full Rollouts Necessary for On-Policy Distillation?
CoRR, May, 2026

Afford-VLA: Action-Aligned Visual Planning via Internalized Affordance.
CoRR, May, 2026

Accelerating Vision Foundation Models with Drop-in Depthwise Convolution.
CoRR, May, 2026

VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation.
CoRR, May, 2026

Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model.
CoRR, May, 2026

Focusable Monocular Depth Estimation.
CoRR, May, 2026

Self-supervised pretraining for an iterative image size agnostic vision transformer.
CoRR, April, 2026

OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation.
CoRR, April, 2026

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results.
CoRR, April, 2026

ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis.
CoRR, April, 2026

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes.
CoRR, March, 2026

OCRA: Object-Centric Learning with 3D and Tactile Priors for Human-to-Robot Action Transfer.
CoRR, March, 2026

InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing.
CoRR, March, 2026

VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning.
CoRR, March, 2026

CPIG: Leveraging Consistency Policy With Intention Guidance for Multiagent Exploration.
IEEE Trans. Cogn. Dev. Syst., February, 2026

EgoSound: Benchmarking Sound Understanding in Egocentric Videos.
CoRR, February, 2026

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
MinD-3D++: Advancing fMRI-Based 3D Reconstruction With High-Quality Textured Mesh Generation and a Comprehensive Dataset.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2025

StreamEQA: Towards Streaming Video Understanding for Embodied Scenarios.
CoRR, December, 2025

ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos.
CoRR, December, 2025

V<sup>2</sup>-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence.
CoRR, November, 2025

Exo2EgoSyn: Unlocking Foundation Video Generation Models for Exocentric-to-Egocentric Video Synthesis.
CoRR, November, 2025

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks.
CoRR, October, 2025

RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba.
CoRR, October, 2025

SCOOP'D: Learning Mixed-Liquid-Solid Scooping via Sim2Real Generative Policy.
CoRR, October, 2025

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods.
CoRR, October, 2025

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark.
CoRR, October, 2025

MAT: Multi-Range Attention Transformer for Efficient Image Super-Resolution.
IEEE Trans. Circuits Syst. Video Technol., September, 2025

Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization.
CoRR, September, 2025

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents.
CoRR, September, 2025

Meta Learning Task Representation in Multiagent Reinforcement Learning: From Global Inference to Local Inference.
IEEE Trans. Neural Networks Learn. Syst., August, 2025

Vision encoders should be image size agnostic and task driven.
CoRR, August, 2025

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning.
CoRR, June, 2025

CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning.
CoRR, June, 2025

Cross-View Multi-Modal Segmentation @ Ego-Exo4D Challenges 2025.
CoRR, June, 2025

Manifold-aware Representation Learning for Degradation-agnostic Image Restoration.
CoRR, May, 2025

MLLMs are Deeply Affected by Modality Bias.
CoRR, May, 2025

EarthSynth: Generating Informative Earth Observation with Diffusion Models.
CoRR, May, 2025

CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes.
IEEE Robotics Autom. Lett., April, 2025

CamSAM2: Segment Anything Accurately in Camouflaged Videos.
CoRR, March, 2025

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation.
CoRR, March, 2025

AVA: Attentive VLM Agent for Mastering StarCraft II.
CoRR, March, 2025

MergeIT: From Selection to Merging for Efficient Instruction Tuning.
CoRR, March, 2025

LDR: Learning Discrete Representation to Improve Noise Robustness in Multiagent Tasks.
IEEE Trans. Syst. Man Cybern. Syst., January, 2025

Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

RAG-6DPose: Retrieval-Augmented 6D Pose Estimation via Leveraging CAD as Knowledge Base.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

Sequential Multi-Object Grasping with One Dexterous Hand.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

Offline Goal-Conditioned Reinforcement Learning with Elastic-Subgoal Diffused Policy Learning.
Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, 2025

DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Empowering LLM Agents with Zero-Shot Optimal Decision-Making through Q-learning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

XTrack: Multimodal Training Boosts RGB-X Video Object Trackers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Understanding Museum Exhibits using Vision-Language Reasoning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

RLAE: Reinforcement Learning-Assisted Ensemble for LLMs.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025


Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Understanding the World's Museums through Vision-Language Reasoning.
CoRR, 2024

Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting.
CoRR, 2024

ObjectRelator: Enabling Cross-View Object Relation Understanding in Ego-Centric and Exo-Centric Videos.
CoRR, 2024

CPEG: Leveraging Consistency Policy with Consensus Guidance for Multi-agent Exploration.
CoRR, 2024

Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes.
CoRR, 2024

fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction.
CoRR, 2024

Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community.
CoRR, 2024

Towards a Generalist and Blind RGB-X Tracker.
CoRR, 2024

MinD-3D: Reconstruct High-Quality 3D Objects in Human Brain.
Proceedings of the Computer Vision - ECCV 2024, 2024

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector.
Proceedings of the Computer Vision - ECCV 2024, 2024

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Test-Time Linear Out-of-Distribution Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Aligning Credit for Multi-Agent Cooperation via Model-based Counterfactual Imagination.
Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

Open-Vocabulary Video Relation Extraction.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
MinD-3D: Reconstruct High-quality 3D objects in Human Brain.
CoRR, 2023

Meta Style Adversarial Training for Cross-Domain Few-Shot Learning.
CoRR, 2023

On the Importance of Spatial Relations for Few-shot Action Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Generalized Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data.
IEEE Trans. Image Process., 2022

Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain Few-Shot Learning.
CoRR, 2022

TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

LILAC: Learning a Leader for Cooperative Reinforcement Learning.
Proceedings of the IEEE Conference on Games, CoG 2022, Beijing, 2022

2021
Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Can Action be Imitated? Learn to Reconstruct and Transfer Human Dynamics from Videos.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

E-ACJ: Accurate Junction Extraction For Event Cameras.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

2020
Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

2019
Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent.
Proceedings of the 27th ACM International Conference on Multimedia, 2019


  Loading...