Yuxuan Wang
Orcid: 0000-0002-3889-8560Affiliations:
- Alibaba Inc., Qwen team, Beijing, China
- Peking University, Institute of Computer Technology, Beijing, China
- Peking University, Center for Data Science, Beijing, China
- Beijing Institute for General Artificial Intelligence (BIGAI), China
According to our database1,
Yuxuan Wang
authored at least 22 papers
between 2022 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
-
on github.com
On csauthors.net:
Bibliography
2025
CoRR, April, 2025
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens.
CoRR, February, 2025
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format.
CoRR, 2024
CoRR, 2024
CoRR, 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning.
CoRR, 2024
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models.
CoRR, 2024
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding.
CoRR, 2024
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
CoRR, 2023
Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training.
CoRR, 2023
Overview of the NLPCC 2023 Shared Task 10: Learn to Watch TV: Multimodal Dialogue Understanding and Response Generation.
Proceedings of the Natural Language Processing and Chinese Computing, 2023
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
Overview of the NLPCC 2022 Shared Task: Multi-modal Dialogue Understanding and Generation.
Proceedings of the Natural Language Processing and Chinese Computing, 2022
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022