Guo Chen

Orcid: 0009-0009-0944-6651

Affiliations:

Nanjing University, State Key Laboratory for Novel Software Technology, China

According to our database¹, Guo Chen authored at least 42 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2026

Feature matters: Revisiting channel attention for Temporal Action Detection.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

2025

NVIDIA Nemotron Nano V2 VL.

[BibT_eX]

[DOI]

Amala Sanjay Deshmukh

Danial Mohseni-Taheri

Subhashree Radhakrishnan

Ameya Sunil Mahabaleshwarkar

Unnikrishnan Kizhakkemadam Sreekumar

Wanli Jiang

Padmavathy Subramanian

CoRR, November, 2025

Guiding Audio-Visual Question Answering with Collective Question Reasoning.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., October, 2025

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT.

[BibT_eX]

[DOI]

CoRR, October, 2025

Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices.

[BibT_eX]

[DOI]

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., September, 2025

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs.

[BibT_eX]

[DOI]

CoRR, July, 2025

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding.

[BibT_eX]

[DOI]

CoRR, July, 2025

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision.

[BibT_eX]

[DOI]

CoRR, June, 2025

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs.

[BibT_eX]

[DOI]

CoRR, June, 2025

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models.

[BibT_eX]

[DOI]

Aaron Blakeman

Aarti Basant

Abhinav Khattar

Adithya Renduchintala

Amala Sanjay Deshmukh

Ameya Sunil Mahabaleshwarkar

Maer Rodrigues de Melo

Makesh Narsimhan Sreedhar

Marcin Chochowski

Markus Kliegl

CoRR, April, 2025

An Egocentric Vision-Language Model based Portable Real-time Smart Assistant.

[BibT_eX]

[DOI]

CoRR, March, 2025

Token-Efficient Long Video Understanding for Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, March, 2025

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

Egocentric Object-Interaction Anticipation with Retentive and Predictive Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Matching Compound Prototypes for Few-Shot Action Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., September, 2024

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation.

[BibT_eX]

[DOI]

CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Retrieval-Augmented Egocentric Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

BasicTAD: An astounding RGB-Only baseline for temporal action detection.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., July, 2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

AVSegFormer: Audio-Visual Segmentation with Transformer.

[BibT_eX]

[DOI]

CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Champion Solution for the WSDM2023 Toloka VQA Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

MRSN: Multi-Relation Support Network for Video Action Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

ELAN: Enhancing Temporal Action Detection with Location Awareness.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Memory-and-Anticipation Transformer for Online Action Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.

[BibT_eX]

[DOI]

CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.

[BibT_eX]

[DOI]

CoRR, 2022

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022.

[BibT_eX]

[DOI]

CoRR, 2022

DCAN: Improving Temporal Action Detection via Dual Context Aggregation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Guo Chen

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...