Guangyao Li

Orcid: 0000-0002-2179-8555

Affiliations:
  • Tsinghua University, Department of Computer Science and Technology, Beijing, China
  • Renmin University of China, Gaoling School of Artificial Intelligence, Beijing, China (former)


According to our database1, Guangyao Li authored at least 22 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Mettle: Meta-Token Learning for Memory-Efficient Audio-Visual Adaptation.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2026

AV-Unified: A Unified Framework for Audio-visual Scene Understanding.
CoRR, March, 2026

Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding.
CoRR, February, 2026

Sarcasm detection enhanced by multi-modal topics using denoising diffusion probabilistic models.
Pattern Recognit., 2026

2025
PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement.
CoRR, December, 2025

EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks.
CoRR, February, 2025

Improving Compositional Generalization in Cross-Embodiment Learning via Mixture of Disentangled Prototypes.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

PEDE: Enhance Multi-modal Sarcasm Detection in Videos via Prompted Emotion Distributions.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Audio-Visual Instance Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models.
Proceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective, 2024

Boosting Audio Visual Question Answering via Key Semantic-Aware Cues.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

CM-PIE: Cross-Modal Perception for Interactive-Enhanced Audio-Visual Video Parsing.
Proceedings of the IEEE International Conference on Acoustics, 2024

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes.
Proceedings of the Computer Vision - ECCV 2024, 2024

Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Self-supervised audiovisual representation learning for remote sensing data.
Int. J. Appl. Earth Obs. Geoinformation, February, 2023

Towards Long Form Audio-visual Video Understanding.
CoRR, 2023

Progressive Spatio-temporal Perception for Audio-Visual Question Answering.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Multi-Scale Attention for Audio Question Answering.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2020
A review of computer vision technologies for plant phenotyping.
Comput. Electron. Agric., 2020

2019
Shellfish Detection Based on Fusion Attention Mechanism in End-to-End Network.
Proceedings of the Pattern Recognition and Computer Vision - Second Chinese Conference, 2019


  Loading...