Haoqin Sun

Orcid: 0000-0002-8554-8969

According to our database¹, Haoqin Sun authored at least 42 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

ECHO: Towards Emotionally Appropriate and Contextually Aware Interactive Head Generation.

[BibT_eX]

[DOI]

CoRR, March, 2026

Speech-XL: Towards Long-Form Speech Understanding in Large Speech Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue.

[BibT_eX]

[DOI]

CoRR, January, 2026

DBIDM: Implementing blind image separation through a dual branch interactive diffusion model.

[BibT_eX]

[DOI]

Jiaxin Gong

Jindong Xu

Haoqin Sun

Pattern Recognit. Lett., 2026

PD-DDPM: Prior-driven diffusion model for single image dehazing.

[BibT_eX]

[DOI]

Image Vis. Comput., 2026

DIFFA: Large Language Diffusion Models Can Listen and Understand.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Learning Personalised Human Internal Cognition from External Expressive Behaviours for Real Personality Recognition.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation.

[BibT_eX]

[DOI]

CoRR, October, 2025

AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

WildElder: A Chinese Elderly Speech Dataset from the Wild with Fine-Grained Manual Annotations.

[BibT_eX]

[DOI]

CoRR, October, 2025

MECap-R1: Emotion-aware Policy with Reinforcement Learning for Multimodal Emotion Captioning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering.

[BibT_eX]

[DOI]

CoRR, September, 2025

Marco-Voice Technical Report.

[BibT_eX]

[DOI]

CoRR, August, 2025

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling.

[BibT_eX]

[DOI]

CoRR, June, 2025

EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations.

[BibT_eX]

[DOI]

CoRR, May, 2025

CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, February, 2025

MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation.

[BibT_eX]

[DOI]

CoRR, January, 2025

StreamMel: Real-Time Zero-Shot Text-to-Speech Via Interleaved Continuous Autoregressive Modeling.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2025

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Discrete Audio Representations for Automated Audio Captioning.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Emotion-Preserving Prosody Anonymization Network for Voice Privacy Protection.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Feature distribution Adaptation Network for Speech Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Uncertainty-Aware Mean Opinion Score Prediction.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Fine-Grained Disentangled Representation Learning For Multimodal Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions.

[BibT_eX]

[DOI]

Mach. Intell. Res., August, 2023

A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., June, 2023

A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework.

[BibT_eX]

[DOI]

Speech Commun., 2022

Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Haoqin Sun

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...