Rui Qian

Orcid: 0000-0002-0378-6438

Affiliations:

Chinese University of Hong Kong, Multi-Media Lab, Hong Kong
Shanghai Jiao Tong Univeristy, School of Electronic Information and Electrical Engineering, Shanghai, China (2017 - 2021)

According to our database¹, Rui Qian authored at least 33 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

CogStream: Context-guided Streaming Video Question Answering.

[BibT_eX]

[DOI]

CoRR, June, 2025

Seed1.5-VL Technical Report.

[BibT_eX]

[DOI]

CoRR, May, 2025

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Controllable augmentations for video representation learning.

[BibT_eX]

[DOI]

Vis. Intell., 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.

[BibT_eX]

[DOI]

CoRR, 2024

SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images.

[BibT_eX]

[DOI]

CoRR, 2024

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

[BibT_eX]

[DOI]

CoRR, 2024

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Streaming Long Video Understanding with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Rethinking Image-to-Video Adaptation: An Object-Centric Perspective.

[BibT_eX]

[DOI]

Rui Qian

Shuangrui Ding

Dahua Lin

Proceedings of the Computer Vision - ECCV 2024, 2024

Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Class-Aware Sounding Objects Localization via Audiovisual Correspondence.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Motion-inductive Self-supervised Object Discovery in Videos.

[BibT_eX]

[DOI]

CoRR, 2022

Dual Contrastive Learning for Spatio-temporal Representation.

[BibT_eX]

[DOI]

Shuangrui Ding

Rui Qian

Hongkai Xiong

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Static and Dynamic Concepts for Self-supervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Motion-aware Contrastive Video Representation Learning via Foreground-background Merging.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

TA2N: Two-Stage Action Alignment Network for Few-Shot Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Motion-aware Self-supervised Video Representation Learning via Foreground-background Merging.

[BibT_eX]

[DOI]

CoRR, 2021

TTAN: Two-Stage Temporal Alignment Network for Few-shot Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events.

[BibT_eX]

[DOI]

CoRR, 2020

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

ATRW: A Benchmark for Amur Tiger Re-identification in the Wild.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multiple Sound Sources Localization from Coarse to Fine.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Finding Action Tubes with a Sparse-to-Dense Framework.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Rui Qian

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...