Yuhang Cao
Orcid: 0009-0008-3627-590X
According to our database1,
Yuhang Cao
authored at least 57 papers
between 2017 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
COFFA: A Co-Design Framework for Fused-Grained Reconfigurable Architecture Towards Efficient Irregular Loop Handling.
IEEE Trans. Computers, September, 2025
CoRR, August, 2025
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models.
CoRR, August, 2025
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction.
CoRR, July, 2025
CoRR, June, 2025
CoRR, April, 2025
CoRR, February, 2025
CoRR, February, 2025
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning.
CoRR, January, 2025
Detecting and Reducing the Factual Hallucinations of Large Language Models with Metamorphic Testing.
Proc. ACM Softw. Eng., 2025
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the Findings of the Association for Computational Linguistics, 2025
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
CoRR, 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.
CoRR, 2024
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.
CoRR, 2024
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.
CoRR, 2024
CoRR, 2024
CoRR, 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.
CoRR, 2024
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.
CoRR, 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the 35th IEEE International Conference on Application-specific Systems, 2024
2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023
Exploring the Power of Cross-Contextual Large Language Model in Mimic Emotion Prediction.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023
Multimodal Cross-Lingual Features and Weight Fusion for Cross-Cultural Humor Detection.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the International Conference on Field Programmable Technology, 2023
E<sup>2</sup>-ACE: An Energy-Efficient Reconfigurable Crypto-Accelerator with Agile End-to-End Toolchain.
Proceedings of the International Conference on Field Programmable Technology, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
2022
The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022
2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
2020
Proceedings of the Computer Vision - ECCV 2020, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
Circuits Syst. Signal Process., 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017