Ye Bai

Orcid: 0000-0001-5533-6909

Affiliations:

Chinese Academy of Sciences, Institute of Automation, National Laboratory of Pattern Recognition, Beijing, China

According to our database¹, Ye Bai authored at least 42 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

P2Mark: Plug-and-play Parameter-intrinsic Watermarking for Neural Speech Generation.

[BibT_eX]

[DOI]

CoRR, April, 2025

2024

TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023

Transfer knowledge for punctuation prediction via adversarial training.

[BibT_eX]

[DOI]

Speech Commun., April, 2023

HoloSinger: Semantics and Music Driven Motion Generation with Octahedral Holographic Projection.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Image-driven Audio-visual Universal Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Half-Truth: A Partially Fake Audio Detection Dataset.

[BibT_eX]

[DOI]

CoRR, 2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT.

[BibT_eX]

[DOI]

CoRR, 2021

Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Half-Truth: A Partially Fake Audio Detection Dataset.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continual Learning for Fake Audio Detection.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

One In A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

A Public Chinese Dataset for Language Model Adaptation.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2020

Deep imitator: Handwriting calligraphy imitation via deep attention networks.

[BibT_eX]

[DOI]

Pattern Recognit., 2020

Adversarial Transfer Learning for Punctuation Restoration.

[BibT_eX]

[DOI]

CoRR, 2020

Focal Loss for Punctuation Prediction.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Synchronous Transformers for end-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Language-Adversarial Transfer Learning for Low-Resource Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

Self-Attention Transducers for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Language-invariant Bottleneck Features from Adversarial End-to-end Acoustic Models for Low Resource Speech Recognition.

[BibT_eX]

[DOI]

Jiangyan Yi

Jianhua Tao

Ye Bai

Proceedings of the IEEE International Conference on Acoustics, 2019

Hypersphere Embedding and Additive Margin for Query-by-example Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Noise Prior Knowledge Learning for Speech Enhancement via Gated Convolutional Generative Adversarial Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Voice Activity Detection Based on Time-Delay Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

CLMAD: A Chinese Language Model Adaptation Dataset.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Adversarial Multilingual Training for Low-Resource Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2016

End-to-end keywords spotting based on connectionist temporal classification for Mandarin.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Ye Bai

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...