Zhizheng Wu

Orcid: 0009-0001-1192-9857

Affiliations:
  • Chinese University of Hong Kong, Shenzhen, China
  • Meta (former)
  • JD.com (former)
  • Apple (former)
  • University of Edinburgh, UK (former)
  • Microsoft Research Asia (former)
  • Nanyang Technological University, Singapore (Ph.D., 2015)


According to our database1, Zhizheng Wu authored at least 93 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Accented Text-to-Speech Synthesis With Limited Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
CoRR, 2024

SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion.
CoRR, 2024

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing.
CoRR, 2024

2023
Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

TTS-Guided Training for Accent Conversion Without Parallel Data.
IEEE Signal Process. Lett., 2023

Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis.
IEEE Signal Process. Lett., 2023

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit.
CoRR, 2023

Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder.
CoRR, 2023

Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion.
CoRR, 2023

Audio compression-assisted feature extraction for voice replay attack detection.
CoRR, 2023

AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification.
CoRR, 2023

An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification.
CoRR, 2023

PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network.
CoRR, 2023

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.
CoRR, 2023

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Zero-shot multi-speaker accent TTS with limited accent data.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
Audio Splicing Localization: Can We Accurately Locate the Splicing Tampering?
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

2021
Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

2019
Building a Mixed-Lingual Neural TTS System with Only Monolingual Data.
Proceedings of the Interspeech 2019, 2019

2017
An Exemplar-Based Approach to Frequency Warping for Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Deep Feature Engineering for Noise Robust Spoofing Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge.
IEEE J. Sel. Top. Signal Process., 2017


2016
Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Improving Trajectory Modelling for DNN-Based Speech Synthesis by Using Stacked Bottleneck Features and Minimum Generation Error Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Synthetic speech detection using phase information.
Speech Commun., 2016

On the study of replay and voice conversion attacks to text-dependent speaker verification.
Multim. Tools Appl., 2016

Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Trajectory Error Training.
CoRR, 2016

Investigating gated recurrent neural networks for speech synthesis.
CoRR, 2016

Spoofing detection under noisy conditions: a preliminary investigation and an initial database.
CoRR, 2016

Merlin: An Open Source Neural Network Speech Synthesis System.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Multidimensional scaling of systems in the Voice Conversion Challenge 2016.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

A Demonstration of the Merlin Open Source Neural Network Speech Synthesis System.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

On the impact of phoneme alignment in DNN-based speech synthesis.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Analysis of the Voice Conversion Challenge 2016 Evaluation Results.
Proceedings of the Interspeech 2016, 2016

The Voice Conversion Challenge 2016.
Proceedings of the Interspeech 2016, 2016

An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions.
Proceedings of the Interspeech 2016, 2016

A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs.
Proceedings of the Interspeech 2016, 2016

Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis.
Proceedings of the Interspeech 2016, 2016

GlottDNN - A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis.
Proceedings of the Interspeech 2016, 2016

Investigating gated recurrent networks for speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

From HMMS to DNNS: Where do the improvements come from?
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Spoofing detection from a feature representation perspective.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep neural network-guided unit selection synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Robust TTS duration modelling using DNNS.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

On the training of DNN-based average voice model for speech synthesis.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

On the use of I-vectors and average voice model for voice conversion without parallel data.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Predicting articulatory movement from text using deep architecture with stacked bottleneck features.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Anti-spoofing, Voice Conversion.
Proceedings of the Encyclopedia of Biometrics, Second Edition, 2015

Anti-spoofing, Voice Databases.
Proceedings of the Encyclopedia of Biometrics, Second Edition, 2015

Spectral mapping for voice conversion
PhD thesis, 2015

Joint Speaker Verification and Antispoofing in the i-Vector Space.
IEEE Trans. Inf. Forensics Secur., 2015

Spoofing and countermeasures for speaker verification: A survey.
Speech Commun., 2015

Exemplar-based voice conversion using joint nonnegative matrix factorization.
Multim. Tools Appl., 2015

A study of speaker adaptation for DNN-based speech synthesis.
Proceedings of the INTERSPEECH 2015, 2015

ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge.
Proceedings of the INTERSPEECH 2015, 2015

Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): introductory talk by the organizers.
Proceedings of the INTERSPEECH 2015, 2015

Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features.
Proceedings of the INTERSPEECH 2015, 2015

Human vs machine spoofing detection on wideband and narrowband data.
Proceedings of the INTERSPEECH 2015, 2015

Sentence-level control vectors for deep neural network speech synthesis.
Proceedings of the INTERSPEECH 2015, 2015

Towards minimum perceptual error training for DNN-based speech synthesis.
Proceedings of the INTERSPEECH 2015, 2015

System fusion for high-performance voice conversion.
Proceedings of the INTERSPEECH 2015, 2015

Deep neural network context embeddings for model selection in rich-context HMM synthesis.
Proceedings of the INTERSPEECH 2015, 2015

Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning.
Proceedings of the INTERSPEECH 2015, 2015

Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

SAS: A speaker verification spoofing database containing diverse attacks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Sparse representation for frequency warping based voice conversion.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Speaker Recognition Anti-spoofing.
Proceedings of the Handbook of Biometric Anti-Spoofing, 2014

Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Correlation-based frequency warping for voice conversion.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Joint nonnegative matrix factorization for exemplar-based voice conversion.
Proceedings of the INTERSPEECH 2014, 2014

A comparative study of spectral transformation techniques for singing voice synthesis.
Proceedings of the INTERSPEECH 2014, 2014

Introducing i-vectors for joint anti-spoofing and speaker verification.
Proceedings of the INTERSPEECH 2014, 2014

A study on replay attack and anti-spoofing for text-dependent speaker verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
Exemplar-based voice conversion using non-negative spectrogram deconvolution.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Exemplar-based unit selection for voice conversion utilizing temporal information.
Proceedings of the INTERSPEECH 2013, 2013

Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints.
Proceedings of the INTERSPEECH 2013, 2013

Synthetic speech detection using temporal modulation feature.
Proceedings of the IEEE International Conference on Acoustics, 2013

Conditional restricted Boltzmann machine for voice conversion.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Voice conversion and spoofing attack on speaker verification systems.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Local partial least square regression for spectral mapping in voice conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012
Mixture of Factor Analyzers Using Priors From Non-Parallel Speech for Voice Conversion.
IEEE Signal Process. Lett., 2012

Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition.
Proceedings of the INTERSPEECH 2012, 2012

Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units.
IEEE Trans. Speech Audio Process., 2011

2010
Automatic prosody prediction and detection with Conditional Random Field (CRF) models.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Text-independent F0 transformation with non-parallel data for voice conversion.
Proceedings of the INTERSPEECH 2010, 2010

2009
A minimum v/u error approach to F0 generation in HMM-based TTS.
Proceedings of the INTERSPEECH 2009, 2009

Improved prosody generation by maximizing joint likelihood of state and longer units.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Modeling and Generating Tone Contour with Phrase Intonation for Mandarin Chinese Speech.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Duration refinement by jointly optimizing state and longer unit likelihood.
Proceedings of the INTERSPEECH 2008, 2008


  Loading...