Xiaohai Tian

Orcid: 0000-0001-5219-1249

According to our database¹, Xiaohai Tian authored at least 68 papers between 2010 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Anatomy of the Modality Gap: Dissecting the Internal States of End-to-End Speech LLMs.

[BibT_eX]

[DOI]

CoRR, March, 2026

Integrating Fine-Grained Audio-Visual Evidence for Robust Multimodal Emotion Reasoning.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

End-to-end Listen, Look, Speak and Act.

[BibT_eX]

[DOI]

CoRR, October, 2025

SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation.

[BibT_eX]

[DOI]

CoRR, May, 2025

Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context.

[BibT_eX]

[DOI]

CoRR, March, 2025

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation.

[BibT_eX]

[DOI]

CoRR, 2024

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing.

[BibT_eX]

[DOI]

CoRR, 2024

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

TTS-Guided Training for Accent Conversion Without Parallel Data.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2023

Disentangling the Contribution of Non-native Speech in Automated Pronunciation Assessment.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

An ASR-Free Fluency Scoring Approach with Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Phone-Level Linguistic-Acoustic Similarity For Utterance-Level Pronunciation Scoring.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information.

[BibT_eX]

[DOI]

CoRR, 2022

A Transfer and Multi-Task Learning based Approach for MOS Prediction.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation.

[BibT_eX]

[DOI]

Yi Zhou

Xiaohai Tian

Haizhou Li

IEEE ACM Trans. Audio Speech Lang. Process., 2021

NHSS: A speech and singing parallel database.

[BibT_eX]

[DOI]

Speech Commun., 2021

Factorized WaveNet for voice conversion with limited data.

[BibT_eX]

[DOI]

Speech Commun., 2021

Optimizing Voice Conversion Network with Cycle Consistency Loss of Speaker Identity.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The Multi-Speaker Multi-Style Voice Cloning Challenge 2021.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Multi-Task WaveRNN With an Integrated Architecture for Cross-Lingual Voice Conversion.

[BibT_eX]

[DOI]

Yi Zhou

Xiaohai Tian

Haizhou Li

IEEE Signal Process. Lett., 2020

Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion.

[BibT_eX]

[DOI]

Xiaohai Tian

Rohan Kumar Das

Haizhou Li

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Personalized Singing Voice Generation Using WaveRNN.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

The Attacker's Perspective on Automatic Speaker Verification: An Overview.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Code-Switching TTS with Cross-Lingual Language Model.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Effective Wavenet Adaptation for Voice Conversion with Limited Data.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The NUS & NWPU system for Voice Conversion Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

NUS-HLT System for Blizzard Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Voice Conversion Challenge 2020 -- Intra-lingual semi-parallel and cross-lingual voice conversion --.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019

Voice conversion with parallel/non-parallel data and synthetic speech detection

[BibT_eX]

[DOI]

Xiaohai Tian

PhD thesis, 2019

A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data.

[BibT_eX]

[DOI]

Xiaohai Tian

Eng Siong Chng

Haizhou Li

CoRR, 2019

A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data.

[BibT_eX]

[DOI]

Xiaohai Tian

Eng Siong Chng

Haizhou Li

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

WaveNet Factorization with Singular Value Decomposition for Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Speaker-independent Spectral Mapping for Speech-to-Singing Conversion.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Average Modeling Approach to Voice Conversion with Non-Parallel Data.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Usability Analysis of the Novel Functions to Assist the Senior Customers in Online Shopping.

[BibT_eX]

[DOI]

Proceedings of the Social Computing and Social Media. User Experience and Behavior, 2018

The TL-NTU Text-to-speech System for the Blizzard Challenge 2018.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2018, Hyderabad, India, September 8, 2018, 2018

2017

An Exemplar-Based Approach to Frequency Warping for Voice Conversion.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Towards Age-friendly E-commerce Through Crowd-Improved Speech Recognition, Multimodal Search, and Personalized Speech Feedback.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Conference on Crowd Science and Engineering, 2017

Improving air traffic control speech intelligibility by reducing speaking rate effectively.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Novel Functional Technologies for Age-Friendly E-commerce.

[BibT_eX]

[DOI]

Proceedings of the Human Aspects of IT for the Aged Population. Applications, Services and Contexts, 2017

An investigation of spectral feature partitioning for replay attacks detection.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

High quality voice conversion using prosodic and high-resolution spectral features.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2016

Spoofing detection under noisy conditions: a preliminary investigation and an initial database.

[BibT_eX]

[DOI]

CoRR, 2016

An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Spoofing detection from a feature representation perspective.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Spoofing speech detection using temporal convolutional neural network.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015

Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

System fusion for high-performance voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Personalized synthetic voices for speaking impaired: website and app.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Sparse representation for frequency warping based voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Detecting synthetic speech using long term magnitude and phase information.

[BibT_eX]

[DOI]

Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

A waveform representation framework for high-quality statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014

Correlation-based frequency warping for voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A comparative study of spectral transformation techniques for singing voice synthesis.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Local partial least square regression for spectral mapping in voice conversion.

[BibT_eX]

[DOI]

Xiaohai Tian

Zhizheng Wu

Engsiong Chng

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2010

Speech and Auditory Interfaces for Ubiquitous, Immersive and Personalized Applications.

[BibT_eX]

[DOI]

Proceedings of the Symposia and Workshops on Ubiquitous, 2010

Xiaohai Tian

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...