Yuhang Cao

Orcid: 0009-0008-6800-889X

According to our database¹, Yuhang Cao authored at least 67 papers between 2017 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Federated Graph Neural Networks With Equivalent Hypergraph Construction for Traffic Flow Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., November, 2025

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, October, 2025

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence.

[BibT_eX]

[DOI]

CoRR, October, 2025

LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, October, 2025

COFFA: A Co-Design Framework for Fused-Grained Reconfigurable Architecture Towards Efficient Irregular Loop Handling.

[BibT_eX]

[DOI]

IEEE Trans. Computers, September, 2025

OMeGa: Joint Optimization of Explicit Meshes and Gaussian Splats for Robust Scene-Level Surface Reconstruction.

[BibT_eX]

[DOI]

Yuhang Cao

Haojun Yan

Danya Yao

CoRR, September, 2025

2nd Place Report of MOSEv2 Challenge 2025: Concept Guided Video Object Segmentation via SeC.

[BibT_eX]

[DOI]

CoRR, September, 2025

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

SPARK: Synergistic Policy And Reward Co-Evolving Framework.

[BibT_eX]

[DOI]

CoRR, September, 2025

SIM-CoT: Supervised Implicit Chain-of-Thought.

[BibT_eX]

[DOI]

CoRR, September, 2025

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, August, 2025

Intern-S1: A Scientific Multimodal Foundation Model.

[BibT_eX]

[DOI]

CoRR, August, 2025

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience.

[BibT_eX]

[DOI]

CoRR, August, 2025

Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction.

[BibT_eX]

[DOI]

CoRR, July, 2025

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing.

[BibT_eX]

[DOI]

CoRR, June, 2025

Visual Agentic Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, May, 2025

MM-IFEngine: Towards Multimodal Instruction Following.

[BibT_eX]

[DOI]

CoRR, April, 2025

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance.

[BibT_eX]

[DOI]

CoRR, April, 2025

Visual-RFT: Visual Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, March, 2025

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation.

[BibT_eX]

[DOI]

CoRR, February, 2025

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion.

[BibT_eX]

[DOI]

CoRR, February, 2025

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

[BibT_eX]

[DOI]

CoRR, February, 2025

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning.

[BibT_eX]

[DOI]

CoRR, January, 2025

Detecting and Reducing the Factual Hallucinations of Large Language Models with Metamorphic Testing.

[BibT_eX]

[DOI]

Proc. ACM Softw. Eng., 2025

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Conical Visual Concentration for Efficient Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.

[BibT_eX]

[DOI]

CoRR, 2024

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.

[BibT_eX]

[DOI]

CoRR, 2024

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.

[BibT_eX]

[DOI]

CoRR, 2024

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.

[BibT_eX]

[DOI]

CoRR, 2024

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way.

[BibT_eX]

[DOI]

CoRR, 2024

SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack.

[BibT_eX]

[DOI]

CoRR, 2024

A General-Purpose Device for Interaction with LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

[BibT_eX]

[DOI]

CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.

[BibT_eX]

[DOI]

CoRR, 2024

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Ximalaya ASDR System for ICASSP 2024 in-Car Multi-Channel (ICMC) ASR Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Diacorrect: Error Correction Back-End for Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

MDCRA: A Reconfigurable Accelerator Framework for Multiple Dataflow Lanes.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Conference on Application-specific Systems, 2024

2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.

[BibT_eX]

[DOI]

CoRR, 2023

Exploring the Power of Cross-Contextual Large Language Model in Mimic Emotion Prediction.

[BibT_eX]

[DOI]

Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Multimodal Cross-Lingual Features and Weight Fusion for Cross-Cultural Humor Detection.

[BibT_eX]

[DOI]

Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

V3Det: Vast Vocabulary Visual Detection Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Dynamic Partial Reconfigurable CGRA Framework for Multi-Kernel Applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Field Programmable Technology, 2023

E<sup>2</sup>-ACE: An Energy-Efficient Reconfigurable Crypto-Accelerator with Agile End-to-End Toolchain.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Field Programmable Technology, 2023

PP-MET: A Real-World Personalized Prompt Based Meeting Transcription System.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

MINI: Mining Implicit Novel Instances for Few-Shot Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

TRAM: An Open-Source Template-based Reconfigurable Architecture Modeling Framework.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

2021

WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Few-Shot Object Detection via Association and DIscrimination.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Seesaw Loss for Long-Tailed Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Feature Pyramid Grids.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

CoRR, 2020

Side-Aware Boundary Localization for More Precise Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Prime Sample Attention in Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Speaker Direction-of-Arrival Estimation Based on Orthogonal Dipoles.

[BibT_eX]

[DOI]

Circuits Syst. Signal Process., 2019

MMDetection: Open MMLab Detection Toolbox and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2019

Investigation of Cost Function for Supervised Monaural Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2017

Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Yuhang Cao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...