We stand with Ukraine

We stand with Ukraine

Zexin Cai

According to our database¹, Zexin Cai authored at least 37 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

DiffAnon: Diffusion-based Prosody Control for Voice Anonymization.

[DOI]

Ismail Rasim Ulgen

,

,

Nicholas Andrews

,

,

CoRR, April, 2026

Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR.

[DOI]

,

,

,

,

Leibny Paola García-Perera

,

,

Sanjeev Khudanpur

,

Matthew Wiesner

,

Nicholas Andrews

CoRR, March, 2026

Can LLMs Help Localize Fake Words in Partially Fake Speech?

[DOI]

,

,

,

Sanjeev Khudanpur

,

,

Leibny Paola García-Perera

,

Matthew Wiesner

,

Nicholas Andrews

CoRR, March, 2026

Universal Speech Content Factorization.

[DOI]

Henry Li Xinyuan

,

,

,

Leibny Paola García-Perera

,

,

Sanjeev Khudanpur

,

Nicholas Andrews

,

Matthew Wiesner

CoRR, March, 2026

2025

Content Anonymization for Privacy in Long-form Audio.

[DOI]

Cristina Aggazzotti

,

,

,

Nicholas Andrews

CoRR, October, 2025

Less is More for Synthetic Speech Detection in the Wild.

[DOI]

,

,

Henry Li Xinyuan

,

Leibny Paola García-Perera

,

,

Sanjeev Khudanpur

,

Matthew Wiesner

,

Nicholas Andrews

CoRR, February, 2025

HLTCOE Submission to the VoicePrivacy Attacker Challenge.

[DOI]

Henry Li Xinyuan

,

,

,

,

Leibny Paola García-Perera

,

Sanjeev Khudanpur

,

Nicholas Andrews

,

Matthew Wiesner

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Scalable Controllable Accented TTS.

[DOI]

Henry Li Xinyuan

,

,

,

,

Leibny Paola García-Perera

,

Sanjeev Khudanpur

,

Nicholas Andrews

,

Matthew Wiesner

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Rapidly Adapting to New Voice Spoofing: Few-Shot Detection of Synthesized Speech Under Distribution Shifts.

[DOI]

,

,

Henry Li Xinyuan

,

Leibny Paola García-Perera

,

Sanjeev Khudanpur

,

Matthew Wiesner

,

Nicholas Andrews

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

GenVC: Self-Supervised Zero-Shot Voice Conversion.

[DOI]

,

Henry Li Xinyuan

,

,

Leibny Paola García-Perera

,

,

Sanjeev Khudanpur

,

Matthew Wiesner

,

Nicholas Andrews

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Integrating frame-level boundary detection and deepfake detection for locating manipulated regions in partially spoofed audio forgery attacks.

[DOI]

,

Comput. Speech Lang., April, 2024

HLTCOE JHU Submission to the Voice Privacy Challenge 2024.

[DOI]

Henry Li Xinyuan

,

,

,

,

Leibny Paola García-Perera

,

Sanjeev Khudanpur

,

Nicholas Andrews

,

Matthew Wiesner

CoRR, 2024

Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning.

[DOI]

,

,

CoRR, 2024

The Database and Benchmark For the Source Speaker Tracing Challenge 2024.

[DOI]

,

,

,

,

,

,

,

Hiromitsu Nishizaki

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Privacy Versus Emotion Preservation Trade-Offs in Emotion-Preserving Speaker Anonymization.

[DOI]

,

Henry Li Xinyuan

,

,

Leibny Paola García-Perera

,

,

Sanjeev Khudanpur

,

Nicholas Andrews

,

Matthew Wiesner

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Invertible Voice Conversion with Parallel Data.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Cross-lingual multi-speaker speech synthesis with limited bilingual training data.

[DOI]

,

,

Comput. Speech Lang., 2023

The DKU-DUKEECE System for the Manipulation Region Location Task of ADD 2023.

[DOI]

,

,

,

CoRR, 2023

Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion.

[DOI]

,

,

,

,

,

,

,

,

Biomed. Signal Process. Control., 2023

Waveform Boundary Detection for Partially Spoofed Audio.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Identifying Source Speakers for Voice Conversion Based Spoofing Attacks on Speaker Verification Systems.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Invertible Voice Conversion.

[DOI]

,

CoRR, 2022

SIG-VC: A Speaker Information Guided Zero-Shot Voice Conversion System for Both Human Beings and Machines.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2020

Training Wake Word Detection with Synthesized Speech Data on Confusion Words.

[DOI]

,

,

,

,

,

,

CoRR, 2020

Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario.

[DOI]

,

,

CoRR, 2020

From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint.

[DOI]

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The Duke Entry for 2020 Blizzard Challenge.

[DOI]

,

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019

Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-Level Embedding Features.

[DOI]

,

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

F0 Contour Estimation Using Phonetic Feature in Electrolaryngeal Speech Enhancement.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

The DKU Speech Synthesis System for 2019 Blizzard Challenge.

[DOI]

,

,

,

Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

2018

Insights into End-to-End Learning Scheme for Language Identification.

[DOI]

,

,

,

,

CoRR, 2018

Unsupervised query by example spoken term detection using features concatenated with Self-Organizing Map distances.

[DOI]

,

,

,

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

End-to-end Language Identification using NetFV and NetVLAD.

[DOI]

,

,

,

,

,

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

The DKU-JNU-EMA Electromagnetic Articulography Database on Mandarin and Chinese Dialects with Tandem Feature based Acoustic-to-Articulatory Inversion.

[DOI]

,

,

,

,

,

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification.

[DOI]

,

,

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Insights in-to-End Learning Scheme for Language Identification.

[DOI]

,

,

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition.

[DOI]

,

,

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Loading...