Yingming Gao

Orcid: 0000-0001-5881-3723

According to our database1, Yingming Gao authored at least 50 papers between 2015 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Disentanglement of Prosody Representations via Diffusion Models and Scheduled Gradient Reversal.
IEEE Trans. Neural Networks Learn. Syst., August, 2025

Deep Learning Approaches for Multimodal Intent Recognition: A Survey.
CoRR, July, 2025

Psy-Copilot: Visual Chain of Thought for Counseling.
CoRR, March, 2025

Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling.
CoRR, March, 2025

DetailTTS: Learning Residual Detail Information for Zero-shot Text-to-speech.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Beyond Surface Simplicity: Revealing Hidden Reasoning Attributes for Precise Commonsense Diagnosis.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Controllable 3D Dance Generation Using Diffusion-Based Transformer U-Net.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model.
CoRR, 2024

ExpressiveSinger: Synthesizing Expressive Singing Voice as an Instrument.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

G2DiaR: Enhancing Commonsense Reasoning of LLMs with Graph-to-Dialogue & Reasoning.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Preliminary Study on Automatic Pronunciation Error Detection for Hearing-impaired Children.
Proceedings of the 10th International Conference on Communication and Information Processing, 2024

Frame-Level Emotional State Alignment Method for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Concss: Contrastive-based Context Comprehension for Dialogue-Appropriate Prosody in Conversational Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Spoken Language Intelligence of Large Language Models for Language Learning.
CoRR, 2023

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis.
CoRR, 2023

Mining High-quality Samples from Raw Data and Majority Voting Method for Multimodal Emotion Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

CMCU-CSS: Enhancing Naturalness via Commonsense-based Multi-modal Context Understanding in Conversational Speech Synthesis.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

FTA-net: A Frequency and Time Attention Network for Speech Depression Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Dual Audio Encoders Based Mandarin Prosodic Boundary Prediction by Using Multi-Granularity Prosodic Representations.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploring the interpretability in speech-based adolescent depression detection by SHAP.
Proceedings of the 9th International Conference on Communication and Information Processing, 2023

GaitParse: Gait Parsing Algorithm with Self-Supervised Fine-Tuning for Gait Recognition.
Proceedings of the 9th International Conference on Communication and Information Processing, 2023

M<sup>2</sup>-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab.
PhD thesis, 2022

Articulatory Synthesis of Vocalized /r/ Allophones in German.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Text-Aware End-to-end Mispronunciation Detection and Diagnosis.
CoRR, 2022

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis.
Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022

An Entropy-based Study on the Acquisition of Mandarin Initial Consonants by Korean Learners.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Disyllabic Tone Production and Tone Context Effect in Mandarin-speaking Children with Cochlear Implants.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Contribution of Phonological and Fluency Factors to Chinese L2 Comprehensibility Ratings: A Case Study of Urdu-speaking Learners.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

A study of production error analysis for Mandarin-speaking Children with Hearing Impairment.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

The Importance of Lexical Tone for Sentence Understanding: Utilizing Functional Load Principle to Simulate Comprehension Process.
Proceedings of the International Conference on Asian Language Processing, 2022

2021
A Practical Way to Improve Automatic Phonetic Segmentation Performance.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

2020
Improving Pronunciation Erroneous Tendency Detection with Multi-Model Soft Targets.
J. Signal Process. Syst., 2020

An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
Articulatory Copy Synthesis Based on a Genetic Algorithm.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Research on Illumination Estimation Based on Data Fitting.
Proceedings of the Green Energy and Networking - 6th EAI International Conference, 2019

2018
Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks.
J. Signal Process. Syst., 2018

Speaking Rate Changes Affect Phone Durations Differently for Neutral and Emotional Speech.
Proceedings of the 26th European Signal Processing Conference, 2018

2017
Improving pronunciation erroneous tendency detection with convolutional long short-term memory.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

2016
Improving Mandarin tone recognition based on DNN by combining acoustic and articulatory features.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

DNN based detection of pronunciation erroneous tendency in data sparse condition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
A study on robust detection of pronunciation erroneous tendency based on deep neural network.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015


  Loading...