Yuchong Sun

Orcid: 0009-0004-6559-5620

According to our database¹, Yuchong Sun authored at least 25 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments.

[BibT_eX]

[DOI]

CoRR, May, 2026

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies.

[BibT_eX]

[DOI]

CoRR, May, 2026

Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation.

[BibT_eX]

[DOI]

CoRR, December, 2025

ReGA: Reasoning and Grounding Decoupled GUI Navigation Agents.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2025

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Uncovering Personality Traits via Multimodal LLM for Personalized Image Emotion Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

ETVA: Evaluation of Text-to-Video Alignment via Fine-Grained Question Generation and Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MuKA: Multimodal Knowledge Augmented Visual Information-Seeking.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

EyEar: Learning Audio Synchronized Human Gaze Trajectory Based on Physics-Informed Dynamics.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

ViCo: Engaging Video Comment Generation with Human Preference Rewards.

[BibT_eX]

[DOI]

Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024

BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions.

[BibT_eX]

[DOI]

CoRR, 2023

Translating Text Synopses to Video Storyboards.

[BibT_eX]

[DOI]

CoRR, 2023

Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Going Beyond Closed Sets: A Multimodal Perspective for Video Emotion Analysis.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

TeViS: Translating Text Synopses to Video Storyboards.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Alignment.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Joint Semantic and Strategy Matching for Persuasive Dialogue.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment.

[BibT_eX]

[DOI]

CoRR, 2022

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2021

2017

High-speed driver for SiC MOSFET based on class-E inverter.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Yuchong Sun

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...