Guo Chen

Affiliations:
  • Nanjing University, State Key Laboratory for Novel Software Technology, China


According to our database1, Guo Chen authored at least 36 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs.
CoRR, July, 2025

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding.
CoRR, July, 2025

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision.
CoRR, June, 2025

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs.
CoRR, June, 2025

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models.
CoRR, April, 2025

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models.
CoRR, April, 2025

An Egocentric Vision-Language Model based Portable Real-time Smart Assistant.
CoRR, March, 2025

Token-Efficient Long Video Understanding for Multimodal LLMs.
CoRR, March, 2025

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models.
CoRR, January, 2025

EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Matching Compound Prototypes for Few-Shot Action Recognition.
Int. J. Comput. Vis., September, 2024

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model.
CoRR, 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation.
CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.
CoRR, 2024

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding.
CoRR, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Retrieval-Augmented Egocentric Video Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
BasicTAD: An astounding RGB-Only baseline for temporal action detection.
Comput. Vis. Image Underst., July, 2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
CoRR, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
CoRR, 2023

AVSegFormer: Audio-Visual Segmentation with Transformer.
CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.
CoRR, 2023

Champion Solution for the WSDM2023 Toloka VQA Challenge.
CoRR, 2023

MRSN: Multi-Relation Support Network for Video Action Detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

ELAN: Enhancing Temporal Action Detection with Location Awareness.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Memory-and-Anticipation Transformer for Online Action Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning.
CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.
CoRR, 2022

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022.
CoRR, 2022

DCAN: Improving Temporal Action Detection via Dual Context Aggregation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022


  Loading...