Junyu Gao

Orcid: 0000-0002-8105-5497

Affiliations:
  • Chinese Academy of Sciences, Institute of Automation, National Lab of Pattern Recognition, Beijing, China
  • University of Chinese Academy of Sciences, School of Artifical Intelligence, Beijing, China


According to our database1, Junyu Gao authored at least 65 papers between 2016 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Cross-Modal Dual-Causal Learning for Long-Term Action Recognition.
CoRR, July, 2025

Learning Probabilistic Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2025

NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments.
CoRR, June, 2025

2024
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding.
ACM Trans. Multim. Comput. Commun. Appl., October, 2024

Multimodal Imbalance-Aware Gradient Modulation for Weakly-Supervised Audio-Visual Video Parsing.
IEEE Trans. Circuits Syst. Video Technol., June, 2024

Feature Disentanglement Network: Multi-Object Tracking Needs More Differentiated Features.
ACM Trans. Multim. Comput. Commun. Appl., March, 2024

Learning Proposal-Aware Re-Ranking for Weakly-Supervised Temporal Action Localization.
IEEE Trans. Circuits Syst. Video Technol., January, 2024

Learning Multi-Expert Distribution Calibration for Long-Tailed Video Classification.
IEEE Trans. Multim., 2024

Exploring Rich Semantics for Open-Set Action Recognition.
IEEE Trans. Multim., 2024

Spatiotemporal Orthogonal Projection Capsule Network for Incremental Few-Shot Action Recognition.
IEEE Trans. Multim., 2024

Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation.
IEEE Trans. Image Process., 2024

Revisiting Essential and Nonessential Settings of Evidential Deep Learning.
CoRR, 2024

A Comprehensive Survey on Evidential Deep Learning and Its Applications.
CoRR, 2024

Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Open-Vocabulary Video Scene Graph Generation via Union-aware Semantic Alignment.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

R-EDL: Relaxing Nonessential Settings of Evidential Deep Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Vectorized Evidential Learning for Weakly-Supervised Temporal Action Localization.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Uncertainty-Aware Dual-Evidential Learning for Weakly-Supervised Temporal Action Localization.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Semantic and Temporal Contextual Correlation Learning for Weakly-Supervised Temporal Action Localization.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Weakly-Supervised Video Object Grounding via Causal Intervention.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

Many Hands Make Light Work: Transferring Knowledge From Auxiliary Tasks for Video-Text Retrieval.
IEEE Trans. Multim., 2023

Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations.
IEEE Trans. Multim., 2023

Learning Scene-Aware Spatio-Temporal GNNs for Few-Shot Early Action Prediction.
IEEE Trans. Multim., 2023

Spatial-Temporal Exclusive Capsule Network for Open Set Action Recognition.
IEEE Trans. Multim., 2023

Learning Dual-Routing Capsule Graph Neural Network for Few-Shot Video Classification.
IEEE Trans. Multim., 2023

Test-time Adaptive Vision-and-Language Navigation.
CoRR, 2023

Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing.
CoRR, 2023

Video Entailment via Reaching a Structure-Aware Cross-modal Consensus.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Weakly-supervised Video Scene Graph Generation via Unbiased Cross-modal Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Lite-MKD: A Multi-modal Knowledge Distillation Framework for Lightweight Few-shot Action Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Leveraging Attribute Knowledge for Open-set Action Recognition.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio- Visual Event Perception.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Learning Hierarchical Video Graph Networks for One-Stop Video Delivery.
ACM Trans. Multim. Comput. Commun. Appl., 2022

The Model May Fit You: User-Generalized Cross-Modal Retrieval.
IEEE Trans. Multim., 2022

Compact Representation and Reliable Classification Learning for Point-Level Weakly-Supervised Action Localization.
IEEE Trans. Image Process., 2022

Learning Video Moment Retrieval Without a Single Annotated Video.
IEEE Trans. Circuits Syst. Video Technol., 2022

Learning Semantic-Aware Spatial-Temporal Attention for Interpretable Action Recognition.
IEEE Trans. Circuits Syst. Video Technol., 2022

Learning Muti-expert Distribution Calibration for Long-tailed Video Classification.
CoRR, 2022

Dual-Evidential Learning for Weakly-supervised Temporal Action Localization.
Proceedings of the Computer Vision - ECCV 2022, 2022

Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Health Status Prediction with Local-Global Heterogeneous Behavior Graph.
ACM Trans. Multim. Comput. Commun. Appl., 2021

Knowledge-driven Egocentric Multimodal Activity Recognition.
ACM Trans. Multim. Comput. Commun. Appl., 2021

Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval.
IEEE Trans. Multim., 2021

Learning Dual-Pooling Graph Neural Networks for Few-Shot Video Classification.
IEEE Trans. Multim., 2021

Unsupervised Video Summarization via Relation-Aware Assignment Learning.
IEEE Trans. Multim., 2021

Learning to Model Relationships for Zero-Shot Video Classification.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Weakly-Supervised Video Object Grounding via Stable Context Learning.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Diving Into The Relations: Leveraging Semantic and Visual Structures For Video Moment Retrieval.
Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

Active Universal Domain Adaptation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Fast Video Moment Retrieval.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
CI-GNN: Building a Category-Instance Graph for Zero-Shot Video Classification.
IEEE Trans. Multim., 2020

Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
SMART: Joint Sampling and Regression for Visual Tracking.
IEEE Trans. Image Process., 2019

Graph Convolutional Tracking.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
P2T: Part-to-Target Tracking via Deep Regression Learning.
IEEE Trans. Image Process., 2018

Watch, Think and Attend: End-to-End Video Classification via Dynamic Knowledge Evolution Modeling.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

The Sixth Visual Object Tracking VOT2018 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

2017
Deep Relative Tracking.
IEEE Trans. Image Process., 2017

A Unified Personalized Video Recommendation via Dynamic Recurrent Neural Networks.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

The Visual Object Tracking VOT2017 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

2016
The Visual Object Tracking VOT2016 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016


  Loading...