Daxin Tan

According to our database1, Daxin Tan authored at least 20 papers between 2020 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM.
CoRR, May, 2026

PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs.
CoRR, January, 2026

AEQ-Bench: Measuring Empathy of Omni-Modal Large Models.
CoRR, January, 2026

DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion.
CoRR, January, 2026

2025
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Enhancing Code-switched Text-to-Speech Synthesis Capability in Large Language Models with only Monolingual Corpora.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions.
CoRR, 2024

Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data.
CoRR, 2024

Exploring SSL Discrete Tokens for Multilingual ASR.
CoRR, 2024

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis.
CoRR, 2024

2022
Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue.
CoRR, 2022

CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Environment Aware Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge.
CoRR, 2021

Applying the Information Bottleneck Principle to Prosodic Representation Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Fine-Grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Fine-grained style modelling and transfer in text-to-speech synthesis via content-style disentanglement.
CoRR, 2020


  Loading...