Yuchong Sun

Orcid: 0009-0004-6559-5620

According to our database1, Yuchong Sun authored at least 25 papers between 2017 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments.
CoRR, May, 2026

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies.
CoRR, May, 2026

Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos.
CoRR, March, 2026

2025
JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation.
CoRR, December, 2025

ReGA: Reasoning and Grounding Decoupled GUI Navigation Agents.
Proceedings of the Natural Language Processing and Chinese Computing, 2025

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Uncovering Personality Traits via Multimodal LLM for Personalized Image Emotion Analysis.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

ETVA: Evaluation of Text-to-Video Alignment via Fine-Grained Question Generation and Answering.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MuKA: Multimodal Knowledge Augmented Visual Information-Seeking.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

EyEar: Learning Audio Synchronized Human Gaze Trajectory Based on Physics-Informed Dynamics.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
ViCo: Engaging Video Comment Generation with Human Preference Rewards.
Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024

BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions.
CoRR, 2023

Translating Text Synopses to Video Storyboards.
CoRR, 2023

Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Going Beyond Closed Sets: A Multimodal Perspective for Video Emotion Analysis.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

TeViS: Translating Text Synopses to Video Storyboards.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Alignment.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Joint Semantic and Strategy Matching for Persuasive Dialogue.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment.
CoRR, 2022

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training.
CoRR, 2021

2017
High-speed driver for SiC MOSFET based on class-E inverter.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017


  Loading...