Haoqin Sun

Orcid: 0000-0002-8554-8969

According to our database1, Haoqin Sun authored at least 42 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
ECHO: Towards Emotionally Appropriate and Contextually Aware Interactive Head Generation.
CoRR, March, 2026

Speech-XL: Towards Long-Form Speech Understanding in Large Speech Language Models.
CoRR, February, 2026

Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue.
CoRR, January, 2026

DBIDM: Implementing blind image separation through a dual branch interactive diffusion model.
Pattern Recognit. Lett., 2026

PD-DDPM: Prior-driven diffusion model for single image dehazing.
Image Vis. Comput., 2026

DIFFA: Large Language Diffusion Models Can Listen and Understand.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Learning Personalised Human Internal Cognition from External Expressive Behaviours for Real Personality Recognition.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation.
CoRR, October, 2025

AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation.
CoRR, October, 2025

WildElder: A Chinese Elderly Speech Dataset from the Wild with Fine-Grained Manual Annotations.
CoRR, October, 2025

MECap-R1: Emotion-aware Policy with Reinforcement Learning for Multimodal Emotion Captioning.
CoRR, September, 2025

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering.
CoRR, September, 2025

Marco-Voice Technical Report.
CoRR, August, 2025

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling.
CoRR, June, 2025

EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations.
CoRR, May, 2025

CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition.
CoRR, February, 2025

MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation.
CoRR, January, 2025

StreamMel: Real-Time Zero-Shot Text-to-Speech Via Interleaved Continuous Autoregressive Modeling.
IEEE Signal Process. Lett., 2025

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs.
Proceedings of the Natural Language Processing and Chinese Computing, 2025

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Discrete Audio Representations for Automated Audio Captioning.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Emotion-Preserving Prosody Anonymization Network for Voice Privacy Protection.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Feature distribution Adaptation Network for Speech Emotion Recognition.
CoRR, 2024

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.
CoRR, 2024

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs.
CoRR, 2024

Uncertainty-Aware Mean Opinion Score Prediction.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Fine-Grained Disentangled Representation Learning For Multimodal Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions.
Mach. Intell. Res., August, 2023

A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., June, 2023

A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022
Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework.
Speech Commun., 2022

Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022


  Loading...