Jia Li

Orcid: 0000-0001-9446-249X

Affiliations:
  • Hefei University of Technology, School of Computer and Information Engineering, Hefei, China
  • University of Science and Technology of China (USTC), School of Information Science and Technology, Hefei, China (PhD 2021)


According to our database1, Jia Li authored at least 47 papers between 2019 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Adaptive Physical-Facial Representation Fusion via Subject-Invariant Cross-Modal Prompt Tuning for Video-Based Emotion Recognition.
CoRR, May, 2026

To Fuse or to Drop? Dual-Path Learning for Resolving Modality Conflicts in Multimodal Emotion Recognition.
CoRR, May, 2026

PhysioSync: Temporal and Cross-Modal Contrastive Learning Inspired by Physiological Synchronization for EEG-Based Emotion Recognition.
IEEE Trans. Comput. Soc. Syst., April, 2026

Bidirectional Learning of Facial Action Units and Expressions via Structured Semantic Mapping across Heterogeneous Datasets.
CoRR, April, 2026

Towards Agentic Intelligence for Materials Science.
CoRR, February, 2026

Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data.
IEEE Trans. Affect. Comput., 2026

Fine-grained Text-Video Retrieval with Patch-level Temporal Difference and Aggregation.
Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

Agent Journey Beyond RGB: Hierarchical Semantic-Spatial Representation Enrichment for Vision-and-Language Navigation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Disentangling Foreground and Background for vision-Language Navigation via Online Augmentation.
CoRR, October, 2025

CLAIP-Emo: Parameter-Efficient Adaptation of Language-supervised models for In-the-Wild Audiovisual Emotion Recognition.
CoRR, September, 2025

Emotion Separation and Recognition From a Facial Expression by Generating the Poker Face With Vision Transformers.
IEEE Trans. Comput. Soc. Syst., August, 2025

Generalizable Engagement Estimation in Conversation via Domain Prompting and Parallel Attention.
CoRR, August, 2025

Listening to the Unspoken: Exploring 365 Aspects of Multimodal Interview Performance Assessment.
CoRR, July, 2025

Contrastive Alignment with Semantic Gap-Aware Corrections in Text-Video Retrieval.
CoRR, May, 2025

Adaptive Dual Video Summarization: From Dynamic Keyframes to Captions.
IEEE Trans. Multim., 2025

From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos.
IEEE Trans. Affect. Comput., 2025

Generalizable Engagement Estimation in Conversation via Domain Prompting and Parallel Attention.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

VAEmo: Efficient Representation Learning for Visual-Audio Emotion With Knowledge Injection.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Listening to the Unspoken: Exploring '365' Aspects of Multimodal Interview Performance Assessment.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Traits Run Deep: Enhancing Personality Assessment via Psychology-Guided LLM Representations and Multimodal Apparent Behaviors.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Concept Drift Guided LayerNorm Tuning for Efficient Multimodal Metaphor Identification.
Proceedings of the 2025 International Conference on Multimedia Retrieval, 2025

Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations.
Proceedings of the International Joint Conference on Neural Networks, 2025

Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Exploring and exploiting model uncertainty for robust visual question answering.
Multim. Syst., December, 2024

FTCM: Frequency-Temporal Collaborative Module for Efficient 3D Human Pose Estimation in Video.
IEEE Trans. Circuits Syst. Video Technol., February, 2024

Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval.
CoRR, 2024

UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos.
CoRR, 2024

Dual-Stream Keyframe Enhancement for Video Question Answering.
Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024

DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

2023
Robust facial expression recognition with global-local joint representation learning.
Multim. Syst., October, 2023

MLP-JCG: Multi-Layer Perceptron With Joint-Coordinate Gating for Efficient 3D Human Pose Estimation.
IEEE Trans. Multim., 2023

Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA.
IEEE Trans. Image Process., 2023

Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding.
CoRR, 2023

Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers.
CoRR, 2023

Exploiting Diverse Feature for Multimodal Sentiment Analysis.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Multi-Person Pose Estimation With Accurate Heatmap Regression and Greedy Association.
IEEE Trans. Circuits Syst. Video Technol., 2022

MAN: Mining Ambiguity and Noise for Facial Expression Recognition in the Wild.
Pattern Recognit. Lett., 2022

Multi-stage and multi-branch network with similar expressions label distribution learning for facial expression recognition.
Pattern Recognit. Lett., 2022

Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis.
Proceedings of the MuSe@MM 2022: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, 2022

2020
Emotional Conversation Generation Based on a Bayesian Deep Neural Network.
ACM Trans. Inf. Syst., 2020

Emotional editing constraint conversation content generation based on reinforcement learning.
Inf. Fusion, 2020

2019
Real-Time Traffic Sign Recognition Based on Efficient CNNs in the Wild.
IEEE Trans. Intell. Transp. Syst., 2019

Downhole Track Detection via Multiscale Conditional Generative Adversarial Nets.
CoRR, 2019

Reinforcement Learning Based Emotional Editing Constraint Conversation Generation.
CoRR, 2019

Monocular Depth Estimation as Regression of Classification using Piled Residual Networks.
Proceedings of the 27th ACM International Conference on Multimedia, 2019


  Loading...