Zhengqi Wen

Orcid: 0000-0001-9430-7115

According to our database1, Zhengqi Wen authored at least 106 papers between 2011 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
VLP2MSA: Expanding vision-language pre-training to multimodal sentiment analysis.
Knowl. Based Syst., January, 2024

2023
Learning to Behave Like Clean Speech: Dual-Branch Knowledge Distillation for Noise-Robust Fake Audio Detection.
CoRR, 2023

ADD 2023: the Second Audio Deepfake Detection Challenge.
CoRR, 2023

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion.
CoRR, 2023

Multimodal Cross-Lingual Features and Weight Fusion for Cross-Cultural Humor Detection.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Learning From Yourself: A Self-Distillation Method For Fake Speech Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

ADD 2023: the Second Audio Deepfake Detection Challenge.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition.
IEEE Signal Process. Lett., 2022

Emotion Selectable End-to-End Text-based Speech Editing.
CoRR, 2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.
CoRR, 2022

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition.
CoRR, 2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT.
CoRR, 2021

Which Phonemes Will Distinguish the Different Regions Within the Same Dialect?
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Towards Fine-Grained Prosody Control for Voice Conversion.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Proceedings of the IEEE International Conference on Acoustics, 2021

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Simulation Analysis on Seismic Capacity of 220kV GIS Switch Bay Mobile Load Transfer Equipment.
Proceedings of the EEET 2021: 4th International Conference on Electronics and Electrical Engineering Technology, Nanjing, China, December 3, 2021

One In A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
A Public Chinese Dataset for Language Model Adaptation.
J. Signal Process. Syst., 2020

End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Simultaneous Denoising and Dereverberation Using Deep Embedding Features.
CoRR, 2020

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method.
CoRR, 2020

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features.
CoRR, 2020

Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.
Proceedings of the Interspeech 2020, 2020

Bi-Level Speaker Supervision for One-Shot Speech Synthesis.
Proceedings of the Interspeech 2020, 2020

Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding.
Proceedings of the Interspeech 2020, 2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.
Proceedings of the Interspeech 2020, 2020

ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data.
Proceedings of the Interspeech 2020, 2020

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.
Proceedings of the Interspeech 2020, 2020

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
Proceedings of the Interspeech 2020, 2020

Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Proceedings of the Interspeech 2020, 2020

Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.
Proceedings of the Interspeech 2020, 2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Synchronous Transformers for end-to-end Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Forward-Backward Decoding Sequence for Regularizing End-to-End TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Language-Adversarial Transfer Learning for Low-Resource Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition.
CoRR, 2019

Towards Fine-Grained Prosody Control for Voice Conversion.
CoRR, 2019

Forward-Backward Decoding for Regularizing End-to-End TTS.
Proceedings of the Interspeech 2019, 2019

Self-Attention Transducers for End-to-End Speech Recognition.
Proceedings of the Interspeech 2019, 2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features.
Proceedings of the Interspeech 2019, 2019

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting.
Proceedings of the Interspeech 2019, 2019

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition.
Proceedings of the Interspeech 2019, 2019

Phoneme Dependent Speaker Embedding and Model Factorization for Multi-speaker Speech Synthesis and Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Noise Prior Knowledge Learning for Speech Enhancement via Gated Convolutional Generative Adversarial Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Voice Activity Detection Based on Time-Delay Neural Networks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Investigating Deep Neural Network Adaptation for Generating Exclamatory and Interrogative Speech in Mandarin.
J. Signal Process. Syst., 2018

CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition.
J. Signal Process. Syst., 2018

Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning.
J. Signal Process. Syst., 2018

Distilling Knowledge Using Parallel Data for Far-field Speech Recognition.
CoRR, 2018

Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

CLMAD: A Chinese Language Model Adaptation Dataset.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End.
Proceedings of the Interspeech 2018, 2018

On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis.
Proceedings of the Interspeech 2018, 2018

Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer.
Proceedings of the Interspeech 2018, 2018

Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis.
Proceedings of the Interspeech 2018, 2018

Adversarial Multilingual Training for Low-Resource Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Continuous Multimodal Emotion Prediction Based on Long Short Term Memory Recurrent Neural Network.
Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, October 23, 2017

Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction.
Proceedings of the Interspeech 2017, 2017

Distilling Knowledge from an Ensemble of Models for Punctuation Prediction.
Proceedings of the Interspeech 2017, 2017

2016
Speech Enhancement Based on Analysis-Synthesis Framework with Improved Parameter Domain Enhancement.
J. Signal Process. Syst., 2016

Investigating Effect of Rich Syntactic Features on Mandarin Prosodic Boundaries Prediction.
J. Signal Process. Syst., 2016

Audio Visual Emotion Recognition with Temporal Alignment and Perception Attention.
CoRR, 2016

Text-based sentential stress prediction using continuous lexical embedding for Mandarin speech synthesis.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Learning auxiliary categorical information for speech synthesis based on deep and recurrent neural networks.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Improving accented Mandarin speech recognition by using recurrent neural network based language model adaptation.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

End-to-end keywords spotting based on connectionist temporal classification for Mandarin.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach.
Proceedings of the Interspeech 2016, 2016

The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis.
Proceedings of the Interspeech 2016, 2016

Long short term memory recurrent neural network based encoding method for emotion recognition in video.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Deep neural network based voice conversion with a large synthesized parallel corpus.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition.
Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, 2015

Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis.
Proceedings of the INTERSPEECH 2015, 2015

A novel method of artificial bandwidth extension using deep architecture.
Proceedings of the INTERSPEECH 2015, 2015

2014
Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis.
J. Signal Process. Syst., 2014

Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video.
Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014

Survey on discriminative feature selection for speech emotion recognition.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Context features based pre-selection and weight prediction in concatenation speech synthesis system.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Investigating effect of rich syntactic features on Mandarin prosodic phrase boundaries prediction.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A hierarchical viterbi algorithm for Mandarin hybrid speech synthesis system.
Proceedings of the INTERSPEECH 2014, 2014

A novel hybrid mandarin speech synthesis system using different base units for model training and concatenation.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
A novel unit selection method for concatenation speech system using similarity measure.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

2012
Statistical modification based post-filtering technique for HMM-based speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis.
Proceedings of the INTERSPEECH 2012, 2012

Pitch-Scaled Analysis based Residual Reconstruction for Speech Analysis and Synthesis.
Proceedings of the INTERSPEECH 2012, 2012

2011
An excitation model based on inverse filtering for speech analysis and synthesis.
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

Inverse Filtering Based Harmonic Plus Noise Excitation Model for HMM-Based Speech Synthesis.
Proceedings of the INTERSPEECH 2011, 2011


  Loading...