Leyuan Qu

IEEE Trans. Neural Networks Learn. Syst., February, 2024

Disentangling Prosody Representations With Unsupervised Speech Reconstruction.

[BibT_eX]

[DOI]

Taihao Li

Theresa Pekarek-Rosin

Fuji Ren

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Multi-Modal Emotion Recognition Using Multiple Acoustic Features and Dual Cross-Modal Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition.

[BibT_eX]

[DOI]

Neural Networks, April, 2023

Few Shot Learning Guided by Emotion Distance for Cross-corpus Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

Data Augmentation with Unsupervised Speaking Style Transfer for Speech Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

2021

Neural Network Learning for Robust Speech Recognition.

[BibT_eX]

[DOI]

PhD thesis, 2021

Hearing Faces: Target Speaker Text-to-Speech Synthesis from a Face.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Multimodal Target Speech Separation with Voice and Face References.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Variational Autoencoder with Global- and Medium Timescale Auxiliaries for Emotion Recognition from Speech.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2020, 2020

2019

LipSound: Neural Mel-Spectrogram Reconstruction for Lip Reading.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Combining Articulatory Features with End-to-End Learning in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2018, 2018

2016

Senone log-likelihood ratios based articulatory features in pronunciation erroneous tendency detecting.

[BibT_eX]

[DOI]

Yanlu Xie

Jinsong Zhang

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Landmark of Mandarin nasal codas and its application in pronunciation error detection.

[BibT_eX]

[DOI]

Yanlu Xie

Mark Hasegawa-Johnson