Yingming Gao

Orcid: 0000-0001-5881-3723

According to our database¹, Yingming Gao authored at least 50 papers between 2015 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Disentanglement of Prosody Representations via Diffusion Models and Scheduled Gradient Reversal.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., August, 2025

Deep Learning Approaches for Multimodal Intent Recognition: A Survey.

[BibT_eX]

[DOI]

CoRR, July, 2025

Psy-Copilot: Visual Chain of Thought for Counseling.

[BibT_eX]

[DOI]

CoRR, March, 2025

Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling.

[BibT_eX]

[DOI]

CoRR, March, 2025

DetailTTS: Learning Residual Detail Information for Zero-shot Text-to-speech.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Beyond Surface Simplicity: Revealing Hidden Reasoning Attributes for Precise Commonsense Diagnosis.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Controllable 3D Dance Generation Using Diffusion-Based Transformer U-Net.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks.

[BibT_eX]

[DOI]

Yingming Gao

Peter Birkholz

Ya Li

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

ExpressiveSinger: Synthesizing Expressive Singing Voice as an Instrument.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

G2DiaR: Enhancing Commonsense Reasoning of LLMs with Graph-to-Dialogue & Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Preliminary Study on Automatic Pronunciation Error Detection for Hearing-impaired Children.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Communication and Information Processing, 2024

Frame-Level Emotional State Alignment Method for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Concss: Contrastive-based Context Comprehension for Dialogue-Appropriate Prosody in Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Spoken Language Intelligence of Large Language Models for Language Learning.

[BibT_eX]

[DOI]

Linkai Peng

Baorian Nuchged

Yingming Gao

CoRR, 2023

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

Mining High-quality Samples from Raw Data and Majority Voting Method for Multimodal Emotion Recognition.

[BibT_eX]

[DOI]

Qifei Li

Yingming Gao

Ya Li

Proceedings of the 31st ACM International Conference on Multimedia, 2023

CMCU-CSS: Enhancing Naturalness via Commonsense-based Multi-modal Context Understanding in Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

FTA-net: A Frequency and Time Attention Network for Speech Depression Detection.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Dual Audio Encoders Based Mandarin Prosodic Boundary Prediction by Using Multi-Granularity Prosodic Representations.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploring the interpretability in speech-based adolescent depression detection by SHAP.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Communication and Information Processing, 2023

GaitParse: Gait Parsing Algorithm with Self-Supervised Fine-Tuning for Gait Recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Communication and Information Processing, 2023

M<sup>2</sup>-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab.

[BibT_eX]

[DOI]

Yingming Gao

PhD thesis, 2022

Articulatory Synthesis of Vocalized /r/ Allophones in German.

[BibT_eX]

[DOI]

Simon Stone

Yingming Gao

Peter Birkholz

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Text-Aware End-to-end Mispronunciation Detection and Diagnosis.

[BibT_eX]

[DOI]

CoRR, 2022

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022

An Entropy-based Study on the Acquisition of Mandarin Initial Consonants by Korean Learners.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Disyllabic Tone Production and Tone Context Effect in Mandarin-speaking Children with Cochlear Implants.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Contribution of Phonological and Fluency Factors to Chinese L2 Comprehensibility Ratings: A Case Study of Urdu-speaking Learners.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

A study of production error analysis for Mandarin-speaking Children with Hearing Impairment.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

The Importance of Lexical Tone for Sentence Understanding: Utilizing Functional Load Principle to Simulate Comprehension Process.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Asian Language Processing, 2022

2021

A Practical Way to Improve Automatic Phonetic Segmentation Performance.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

2020

Improving Pronunciation Erroneous Tendency Detection with Multi-Model Soft Targets.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2020

An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

Articulatory Copy Synthesis Based on a Genetic Algorithm.

[BibT_eX]

[DOI]

Yingming Gao

Simon Stone

Peter Birkholz

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Research on Illumination Estimation Based on Data Fitting.

[BibT_eX]

[DOI]

Proceedings of the Green Energy and Networking - 6th EAI International Conference, 2019

2018

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks.

[BibT_eX]

[DOI]

Sabato Marco Siniscalchi

Jinsong Zhang

Chin-Hui Lee

J. Signal Process. Syst., 2018

Speaking Rate Changes Affect Phone Durations Differently for Neutral and Emotional Speech.

[BibT_eX]

[DOI]

Yingming Gao

Peter Birkholz

Proceedings of the 26th European Signal Processing Conference, 2018

2017

Improving pronunciation erroneous tendency detection with convolutional long short-term memory.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Asian Language Processing, 2017

2016

Improving Mandarin tone recognition based on DNN by combining acoustic and articulatory features.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

DNN based detection of pronunciation erroneous tendency in data sparse condition.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015

A study on robust detection of pronunciation erroneous tendency based on deep neural network.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Yingming Gao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...