Di Hu

Orcid: 0000-0002-7118-6733

Affiliations:
  • Baidu Research, Big Data Laboratory, Beijing, China
  • Renmin University of China, Gaoling School of Artificial Intelligence, Beijing, China
  • Northwestern Polytechnical University, School of Computer Science and Engineering, OPTIMAL, Xi'an, China (former)


According to our database1, Di Hu authored at least 52 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Geometric-inspired graph-based Incomplete Multi-view Clustering.
Pattern Recognit., March, 2024

Towards accurate knowledge transfer via target-awareness representation disentanglement.
Mach. Learn., February, 2024

Quantifying and Enhancing Multi-modal Robustness with Modality Preference.
CoRR, 2024

Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Self-supervised audiovisual representation learning for remote sensing data.
Int. J. Appl. Earth Obs. Geoinformation, February, 2023

Self-Supervised Learning for Heterogeneous Audiovisual Scene Analysis.
IEEE Trans. Multim., 2023

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs.
CoRR, 2023

Enhancing Multi-modal Cooperation via Fine-grained Modality Valuation.
CoRR, 2023

Towards Long Form Audio-visual Video Understanding.
CoRR, 2023

Multi-Scale Attention for Audio Question Answering.
CoRR, 2023

Robust Cross-Modal Knowledge Distillation for Unconstrained Videos.
CoRR, 2023

Balanced Audiovisual Dataset for Imbalance Analysis.
CoRR, 2023

Revisiting Pre-training in Audio-Visual Learning.
CoRR, 2023

TikTalk: A Multi-Modal Dialogue Dataset for Real-World Chitchat.
CoRR, 2023

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Exploiting Visual Context Semantics for Sound Source Localization.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Progressive Spatio-temporal Perception for Audio-Visual Question Answering.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Towards Inadequately Pre-trained Models in Transfer Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Class-Aware Sounding Objects Localization via Audiovisual Correspondence.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Learning in Audio-visual Context: A Review, Analysis, and New Perspective.
CoRR, 2022

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance.
CoRR, 2022

Inadequately Pre-trained Models are Better Feature Extractors.
CoRR, 2022

Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction.
Proceedings of the HCMA@MM 2022: Proceedings of the 3rd International Workshop on Human-Centric Multimedia Analysis, 2022

Balanced Multimodal Learning via On-the-fly Gradient Modulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning to Answer Questions in Dynamic Audio-Visual Scenarios.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Generalising combinatorial discriminant analysis through conditioning truncated Rayleigh flow.
Knowl. Inf. Syst., 2021

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Temporal Relational Modeling with Self-Supervision for Action Segmentation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement.
CoRR, 2020

Cross-Task Transfer for Multimodal Aerial Scene Recognition.
CoRR, 2020

Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions.
CoRR, 2020

Curriculum Audiovisual Learning.
CoRR, 2020

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Multiple Sound Sources Localization from Coarse to Fine.
Proceedings of the Computer Vision - ECCV 2020, 2020

Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Deep Binary Reconstruction for Cross-Modal Hashing.
IEEE Trans. Multim., 2019

Discrete Spectral Hashing for Efficient Similarity Retrieval.
IEEE Trans. Image Process., 2019

Dense Multimodal Fusion for Hierarchically Joint Representation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Deep Multimodal Clustering for Unsupervised Audiovisual Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Listen to the Image.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Dense Multimodal Fusion for Hierarchically Joint Representation.
CoRR, 2018

Deep LDA Hashing.
CoRR, 2018

Deep Co-Clustering for Unsupervised Audiovisual Learning.
CoRR, 2018

2017
Image2song: Song Retrieval via Bridging Image Content and Lyric Words.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Large Graph Hashing with Spectral Rotation.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Multimodal Learning via Exploring Deep Semantic Similarity.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Temporal Multimodal Learning in Audiovisual Speech Recognition.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016


  Loading...