Rui Qian

Orcid: 0000-0002-0378-6438

Affiliations:
  • Chinese University of Hong Kong, Multi-Media Lab, Hong Kong
  • Shanghai Jiao Tong Univeristy, School of Electronic Information and Electrical Engineering, Shanghai, China (2017 - 2021)


According to our database1, Rui Qian authored at least 32 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
CogStream: Context-guided Streaming Video Question Answering.
CoRR, June, 2025

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Controllable augmentations for video representation learning.
Vis. Intell., 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
CoRR, 2024

SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images.
CoRR, 2024

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation.
CoRR, 2024

Streaming Long Video Understanding with Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Rethinking Image-to-Video Adaptation: An Object-Centric Perspective.
Proceedings of the Computer Vision - ECCV 2024, 2024

Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Class-Aware Sounding Objects Localization via Audiovisual Correspondence.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Motion-inductive Self-supervised Object Discovery in Videos.
CoRR, 2022

Dual Contrastive Learning for Spatio-temporal Representation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Static and Dynamic Concepts for Self-supervised Video Representation Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Motion-aware Contrastive Video Representation Learning via Foreground-background Merging.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

TA2N: Two-Stage Action Alignment Network for Few-Shot Action Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Motion-aware Self-supervised Video Representation Learning via Foreground-background Merging.
CoRR, 2021

TTAN: Two-Stage Temporal Alignment Network for Few-shot Action Recognition.
CoRR, 2021

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events.
CoRR, 2020

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

ATRW: A Benchmark for Amur Tiger Re-identification in the Wild.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multiple Sound Sources Localization from Coarse to Fine.
Proceedings of the Computer Vision - ECCV 2020, 2020

Finding Action Tubes with a Sparse-to-Dense Framework.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020


  Loading...