Li-Rong Dai

Orcid: 0000-0002-0859-2827

Affiliations:
  • University of Science and Technology of China, National Engineering Laboratory for Speech and Language Information Processing, Hefei, China


According to our database1, Li-Rong Dai authored at least 318 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Sketch-fusion: A gradient compression method with multi-layer fusion for communication-efficient distributed training.
J. Parallel Distributed Comput., March, 2024

Adversarial speech for voice privacy protection from Personalized Speech generation.
CoRR, 2024

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Universal wavelength reuse mechanism for optical networks-on-chip based on a cooperative game.
J. Opt. Commun. Netw., June, 2023

A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Energy-Efficient Sparsity-Driven Speech Enhancement in Wireless Acoustic Sensor Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Rep2wav: Noise Robust text-to-speech Using self-supervised representations.
CoRR, 2023

A Speech Distortion Weighted Single-Channel Wiener Filter Based STFT-Domain Noise Reduction.
Proceedings of the IEEE Statistical Signal Processing Workshop, 2023

Handwritten Chemical Structure Image to Structure-Specific Markup Using Random Conditional Guided Decoder.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

End-to-End Multilingual Text Recognition Based on Byte Modeling.
Proceedings of the Image and Graphics - 12th International Conference, 2023

A Multimodal Text Block Segmentation Framework for Photo Translation.
Proceedings of the Image and Graphics - 12th International Conference, 2023

Vision-Language Adaptive Mutual Decoder for OOV-STR.
Proceedings of the Image and Graphics - 12th International Conference, 2023

Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Robust Data2VEC: Noise-Robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Joint Generative-Contrastive Representation Learning for Anomalous Sound Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Multi-Scale Feature Aggregation Based Lightweight Network for Audio-Visual Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2023

Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Frequency-Invariant Sensor Selection for MVDR Beamforming in Wireless Acoustic Sensor Networks.
IEEE Trans. Wirel. Commun., 2022

A multimodal attention fusion network with a dynamic vocabulary for TextVQA.
Pattern Recognit., 2022

Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition.
Circuits Syst. Signal Process., 2022

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning.
CoRR, 2022

Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR.
CoRR, 2022

A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition.
CoRR, 2022

Differential Time-frequency Log-mel Spectrogram Features for Vision Transformer Based Infant Cry Recognition.
Proceedings of the Interspeech 2022, 2022

Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.
Proceedings of the Interspeech 2022, 2022

A Complementary Joint Training Approach Using Unpaired Speech and Text A Complementary Joint Training Approach Using Unpaired Speech and Text.
Proceedings of the Interspeech 2022, 2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Proceedings of the Interspeech 2022, 2022

Structural String Decoder for Handwritten Mathematical Expression Recognition.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

An Experimental Comparison between Low-Resource Semi-Supervised and High-Resource Supervised Automatic Speech Recognition Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

Learning Contextually Fused Audio-Visual Representations For Audio-Visual Speech Recognition.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Frontend Attributes Disentanglement for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Domain Robust Deep Embedding Learning for Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Supervised and Self-Supervised Pretraining Based Covid-19 Detection Using Acoustic Breathing/Cough/Speech Signals.
Proceedings of the IEEE International Conference on Acoustics, 2022

Reference Microphone Selection and Low-Rank Approximation Based Multichannel Wiener Filter with Application to Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.
Proceedings of the IEEE International Conference on Acoustics, 2022

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
SRD: A Tree Structure Based Decoder for Online Handwritten Mathematical Expression Recognition.
IEEE Trans. Multim., 2021

UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Sensor Selection for Relative Acoustic Transfer Function Steered Linearly-Constrained Beamformers.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

A Study on Reference Microphone Selection for Multi-Microphone Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement.
Neural Networks, 2021

XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition.
CoRR, 2021

The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021.
Proceedings of the 18th International Conference on Spoken Language Translation, 2021

An Improved Wav2Vec 2.0 Pre-Training Approach Using Enhanced Local Dependency Modeling for Speech Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

UnitNet-Based Hybrid Speech Synthesis.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

An Improved Mean Teacher Based Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021

An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Learning and Modeling Unit Embeddings Using Deep Neural Networks for Unit-Selection-Based Mandarin Speech Synthesis.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2020

Radical analysis network for learning hierarchies of Chinese characters.
Pattern Recognit., 2020

Segment boundary detection directed attention for online end-to-end speech recognition.
EURASIP J. Audio Speech Music. Process., 2020

Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention.
CoRR, 2020

Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer.
CoRR, 2020

Attentive batch normalization for lstm-based acoustic modeling of speech recognition.
CoRR, 2020

Effective Exploitation of Posterior Information for Attention-Based Speech Recognition.
IEEE Access, 2020

An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection.
Proceedings of the Interspeech 2020, 2020

Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution.
Proceedings of the Interspeech 2020, 2020

Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning.
Proceedings of the Interspeech 2020, 2020

An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions.
Proceedings of the Interspeech 2020, 2020

A Tree-Structured Decoder for Image-to-Markup Generation.
Proceedings of the 37th International Conference on Machine Learning, 2020

Extracting Unit Embeddings Using Sequence-To-Sequence Acoustic Models for Unit Selection Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Attention-Based Gated Scaling Adaptive Acoustic Model for CTC-Based Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition.
IEEE Trans. Multim., 2019

Sequence-to-Sequence Acoustic Modeling for Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Listening and Grouping: An Online Autoregressive Approach for Monaural Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Deep Neural Network Embedding Learning with High-Order Statistics for Text-Independent Speaker Verification.
CoRR, 2019

Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification.
Proceedings of the Interspeech 2019, 2019

Multi-Task Learning with High-Order Statistics for x-Vector Based Text-Independent Speaker Verification.
Proceedings of the Interspeech 2019, 2019

Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling.
Proceedings of the Interspeech 2019, 2019

An Effective Deep Embedding Learning Architecture for Speaker Verification.
Proceedings of the Interspeech 2019, 2019

Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.
Proceedings of the Interspeech 2019, 2019

A Chinese Dataset for Identifying Speakers in Novels.
Proceedings of the Interspeech 2019, 2019

Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels.
Proceedings of the Interspeech 2019, 2019

Deep Neural Network Based Regression Approach for Acoustic Echo Cancellation.
Proceedings of the 4th International Conference on Multimedia Systems and Signal Processing, 2019

Improving Sequence-to-sequence Voice Conversion by Adding Text-supervision.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification.
Proceedings of the IEEE International Conference on Acoustics, 2019

Knowledge Distillation from Multilingual and Monolingual Teachers for End-to-End Multilingual Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Learning Adaptive Downsampling Encoding for Online End-to-End Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Triplet-Center Loss Based Deep Embedding Learning Method for Speaker Verification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Improving the Decoding Efficiency of Deep Neural Network Acoustic Models by Cluster-Based Senone Selection.
J. Signal Process. Syst., 2018

A Multiobjective Learning and Ensembling Approach to High-Performance Speech Enhancement With Compact Neural Network Architectures.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

LID-Senones and Their Statistics for Language Identification.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Statistical Parametric Speech Synthesis Using Generalized Distillation Framework.
IEEE Signal Process. Lett., 2018

Articulatory-to-acoustic conversion using BLSTM-RNNs with augmented input representation.
Speech Commun., 2018

A Conditional Generative Model for Speech Enhancement.
Circuits Syst. Signal Process., 2018

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision.
CoRR, 2018

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis.
CoRR, 2018

A Maximum Likelihood Approach to Masking-based Speech Enhancement Using Deep Neural Network.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis.
Proceedings of the Interspeech 2018, 2018

Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition.
Proceedings of the Interspeech 2018, 2018

WaveNet Vocoder with Limited Training Data for Voice Conversion.
Proceedings of the Interspeech 2018, 2018

An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition.
Proceedings of the Interspeech 2018, 2018

An Improved Deep Embedding Learning Method for Short Duration Speaker Verification.
Proceedings of the Interspeech 2018, 2018

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Radical Analysis Network for Zero-Shot Learning in Printed Chinese Character Recognition.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Deep-FSMN for Large Vocabulary Continuous Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Source-Aware Context Network for Single-Channel Multi-Speaker Speech Separation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Densely Connected Progressive Learning for LSTM-Based Speech Enhancement.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Pseudo-Supervised Approach for Text Clustering Based on Consensus Analysis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Capsule based Approach for Polyphonic Sound Event Detection.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
Nonrecurrent Neural Structure for Long-Term Dependence.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments.
Speech Commun., 2017

Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition.
Pattern Recognit., 2017

Towards human-like and transhuman perception in AI 2.0: a review.
Frontiers Inf. Technol. Electron. Eng., 2017

An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech.
Comput. Speech Lang., 2017

RAN: Radical analysis networks for zero-shot learning of Chinese characters.
CoRR, 2017

Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering.
CoRR, 2017

A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation.
Proceedings of the Interspeech 2017, 2017

End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling.
Proceedings of the Interspeech 2017, 2017

Gaussian Prediction Based Attention for Online End-to-End Speech Recognition.
Proceedings of the Interspeech 2017, 2017

An investigation of high-resolution modeling units of deep neural networks for acoustic scene classification.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

Extracting structural spectral features using what-where auto-encoders for statistical parametric speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Adaptation of PLDA for multi-source text-independent speaker verification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Joint noise and mask aware training for DNN-based speech enhancement with SUB-band features.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

Multiple-target deep learning for LSTM-RNN based speech enhancement.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

The USTC system for blizzard machine learning challenge 2017-ES2.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Feedforward sequential memory networks based encoder-decoder model for machine translation.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Learning the number of nodes in DNNs with activation mask.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Emotional statistical parametric speech synthesis using LSTM-RNNs.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition.
J. Signal Process. Syst., 2016

Exploration of Local Variability in Text-Independent Speaker Verification.
J. Signal Process. Syst., 2016

A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Modeling F0 trajectories in hierarchically structured deep neural networks.
Speech Commun., 2016

Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Learn Neural Networks.
J. Mach. Learn. Res., 2016

Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition.
EURASIP J. Adv. Signal Process., 2016

Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering.
Comput. Speech Lang., 2016

Image classification with CNN-based Fisher vector coding.
Proceedings of the 2016 Visual Communications and Image Processing, 2016

Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

USTC at NTCIR-12 STC Task.
Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, 2016

Rapid speaker adaptation based on D-code extracted from BLSTM-RNN in LVCSR.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Mismatched training data enhancement for automatic recognition of children's speech using DNN-HMM.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Cluster-based senone selection for the efficient calculation of deep neural network acoustic models.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Unsupervised speaker adaptation of BLSTM-RNN for LVCSR based on speaker code.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Learning FOFE based FNN-LMs with noise contrastive estimation and part-of-speech features.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

A regression approach to binaural speech segregation via deep neural network.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

RNN-BLSTM Based Multi-Pitch Estimation.
Proceedings of the Interspeech 2016, 2016

Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition.
Proceedings of the Interspeech 2016, 2016

Future Context Attention for Unidirectional LSTM Based Acoustic Model.
Proceedings of the Interspeech 2016, 2016

Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks.
Proceedings of the Interspeech 2016, 2016

Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks.
Proceedings of the Interspeech 2016, 2016

SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement.
Proceedings of the Interspeech 2016, 2016

The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F<sub>0</sub> Conversion.
Proceedings of the Interspeech 2016, 2016

Modeling spectral envelopes using deep conditional restricted Boltzmann machines for statistical parametric speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Modulation spectrum compensation for HMM-based speech synthesis using line spectral pairs.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Compact convolutional neural network transfer learning for small-scale image classification.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Speaker adaptation OF RNN-BLSTM for speech recognition based on speaker code.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep belief network-based post-filtering for statistical parametric speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Content-aware local variability vector for speaker verification with short utterance.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Unsupervised single-channel speech separation via deep neural network for different gender mixtures.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Boosting DNN-based speech enhancement via explicit transformations.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

A Regression Approach to Speech Enhancement Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Quasi-Factorial Prior for i-vector Extraction.
IEEE Signal Process. Lett., 2015

Statistical parametric speech synthesis using a hidden trajectory model.
Speech Commun., 2015

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency.
CoRR, 2015

A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models.
CoRR, 2015

Feedforward Sequential Memory Neural Networks without Recurrent Feedback.
CoRR, 2015

Deep Bottleneck Feature for Image Classification.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Rectified linear neural networks with tied-scalar regularization for LVCSR.
Proceedings of the INTERSPEECH 2015, 2015

Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement.
Proceedings of the INTERSPEECH 2015, 2015

High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identification.
Proceedings of the INTERSPEECH 2015, 2015

A universal VAD based on jointly trained deep neural networks.
Proceedings of the INTERSPEECH 2015, 2015

Deep bottleneck network based i-vector representation for language identification.
Proceedings of the INTERSPEECH 2015, 2015

Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions.
Proceedings of the INTERSPEECH 2015, 2015

Phone-centric local variability vector for text-constrained speaker verification.
Proceedings of the INTERSPEECH 2015, 2015

Writer adaptive feature extraction based on convolutional neural networks for online handwritten Chinese character recognition.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

Unsupervised speaker adaptation of deep neural network based on the combination of speaker codes and singular value decomposition for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improved language identification using deep bottleneck network.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Spectral conversion using deep neural networks trained with multi-source speakers.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Joint training of front-end and back-end deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Channel adaptation of plda for text-independent speaker verification.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments.
Proceedings of the Latent Variable Analysis and Signal Separation, 2015

LIP movement generation using restricted Boltzmann machines for visual speech synthesis.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

A unified speaker-dependent speech separation and enhancement system based on deep neural networks.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

2014
Fast adaptation of deep neural network based on discriminant codes for speech recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Voice conversion using deep neural networks with layer-wise generative training.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

An Experimental Study on Speech Enhancement Based on Deep Neural Networks.
IEEE Signal Process. Lett., 2014

HMM-based unit selection speech synthesis using log likelihood ratios derived from perceptual data.
Speech Commun., 2014

Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs.
IEICE Trans. Inf. Syst., 2014

Local Variability Modeling for Text-Independent Speaker Verification.
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Cross-language transfer learning for deep neural network based speech enhancement.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Integrating global variance of log power spectrum derived from LSPs into MGE training for HMM-based parametric speech synthesis.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Performance evaluation of deep bottleneck features for spoken language identification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Local variability vector for text-independent speaker verification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree.
Proceedings of the INTERSPEECH 2014, 2014

Dynamic noise aware training for speech enhancement based on deep neural networks.
Proceedings of the INTERSPEECH 2014, 2014

Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2014, 2014

Task-aware deep bottleneck features for spoken language identification.
Proceedings of the INTERSPEECH 2014, 2014

Robust speech recognition with speech enhanced deep neural networks.
Proceedings of the INTERSPEECH 2014, 2014

Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes.
Proceedings of the INTERSPEECH 2014, 2014

Formant-controlled speech synthesis using hidden trajectory model.
Proceedings of the INTERSPEECH 2014, 2014

A Study of Designing Compact Classifiers Using Deep Neural Networks for Online Handwritten Chinese Character Recognition.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Writer Adaptation Using Bottleneck Features and Discriminative Linear Regression for Online Handwritten Chinese Character Recognition.
Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition, 2014

Sequence training of multiple deep neural networks for better performance and faster training speed.
Proceedings of the IEEE International Conference on Acoustics, 2014

Improving deep neural networks for LVCSR using dropout and shrinking structure.
Proceedings of the IEEE International Conference on Acoustics, 2014

Spectral modeling using neural autoregressive distribution estimators for statistical parametric speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code.
Proceedings of the IEEE International Conference on Acoustics, 2014

Lattice based optimization of bottleneck feature extractor with linear transformation.
Proceedings of the IEEE International Conference on Acoustics, 2014

Using bidirectional associative memories for joint spectral envelope modeling in voice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2014

Synthesized stereo mapping via deep neural networks for noisy speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Minimum divergence estimation of speaker prior in multi-session PLDA scoring.
Proceedings of the IEEE International Conference on Acoustics, 2014

A spectral based visual matching method for image classification.
Proceedings of the International Conference on Audio, 2014

Global variance equalization for improving deep neural network based speech enhancement.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

2013
Joint spectral distribution modeling using restricted boltzmann machines for voice conversion.
Proceedings of the INTERSPEECH 2013, 2013

A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Unsupervised prosodic phrase boundary labeling of Mandarin speech synthesis database using context-dependent HMM.
Proceedings of the IEEE International Conference on Acoustics, 2013

Exemplar based language recognition method for short-duration speech segments.
Proceedings of the IEEE International Conference on Acoustics, 2013

Phoneme variation based synthesized speech discrimination for speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2013

Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Minimum Kullback-Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis.
IEEE Trans. Speech Audio Process., 2012

Spoken term detection for OOV terms based on triphone confusion matrix.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A hybrid fragment / syllable-based system for improved OOV term detection.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Intra-conversation intra-speaker variability compensation for speaker clustering.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis.
Proceedings of the INTERSPEECH 2012, 2012

Exemplar-Based Sparse Representation for Language Recognition on I-Vectors.
Proceedings of the INTERSPEECH 2012, 2012

2011
Trust Region-Based Optimization for Maximum Mutual Information Estimation of HMMs in Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model.
Proceedings of the INTERSPEECH 2011, 2011

Formant-Controlled HMM-Based Speech Synthesis.
Proceedings of the INTERSPEECH 2011, 2011

Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis.
Proceedings of the INTERSPEECH 2011, 2011

Factored covariance modeling for text-independent speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2011

Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score.
Proceedings of the IEEE International Conference on Acoustics, 2011

Speaker characterization using spectral subband energy ratio based on Harmonic plus Noise Model.
Proceedings of the IEEE International Conference on Acoustics, 2011

Preserve ordering property of generated LSPS for minimum generation error training in HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2011

Non-parallel training for voice conversion based on FT-GMM.
Proceedings of the IEEE International Conference on Acoustics, 2011

Effective image representation based on bi-layer visual codebook.
Proceedings of the First Asian Conference on Pattern Recognition, 2011

2010
Cross-Validation and Minimum Generation Error based Decision Tree Pruning for HMM-based Speech Synthesis.
Int. J. Comput. Linguistics Chin. Lang. Process., 2010

Minimum generation error training for HMM-based prediction of articulatory movements.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Automatic phrase boundary labeling for Mandarin TTS corpus using context-dependent HMM.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

The description of iFlyTek Speech Lab system for NIST2009 Language Recognition Evaluation.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Phonetic clustering based confidence measure for embedded speech recognition.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Factor analysis based spatial correlation modeling for speaker verification.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Non-negative matrix factorization based discriminative features for speaker verification.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

GMM-based voice conversion with explicit modelling on feature transform.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Speaker verification against synthetic speech.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

The estimation and kernel metric of spectral correlation for text-independent speaker verification.
Proceedings of the INTERSPEECH 2010, 2010

Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier.
Proceedings of the INTERSPEECH 2010, 2010

Effects of the phonological relevance in speaker verification.
Proceedings of the INTERSPEECH 2010, 2010

Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2010, 2010

A hierarchical F0 modeling method for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2010, 2010

Multiple instance learning using visual phrases for object classification.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

A bounded trust region optimization for discriminative training of HMMS in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Minimum generation error training with weighted Euclidean distance on LSP for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2010

N-gram nearest neighbor algorithm for voice password system.
Proceedings of the IEEE International Conference on Acoustics, 2010

HMM-based pseudo-clean speech synthesis for splice algorithm.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Semi-supervised kernel density estimation for video annotation.
Comput. Vis. Image Underst., 2009

Asynchronous F0 and spectrum modeling for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2009, 2009

An automatic language identification method based on subspace analysis.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Full covariance state duration modeling for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2009

Exploiting prosodic information for Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009


iFLY system for the NIST 2008 speaker recognition evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Investigation on Adaptation Using Different Discriminative Training Criteria Based Linear Regression and Map.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Multi-Layer F0 Modeling for HMM-Based Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Parallel Phone Recognizer based MLLR Speaker Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A Sample and Feature Selection Scheme for GMM-SVM Based Language Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Interfusing the Confused Region Score of Speaker Verification Systems.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Exploiting Non-Target Region Information for Confidence Measure Based on Bayesian Information Criterion.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Double Gauss Based Unsupervised Score Normalization in Speaker Verification.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

The Adaptation Schemes In PR-SVM Based Language Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Heteronym Verification for Mandarin Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Minimum generation error criterion considering global/local variance for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2008

Minumum generation error linear regression based model adaptation for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Interactive Video Annotation by Multi-Concept Multi-Modality Active Learning.
Int. J. Semantic Comput., 2007

RMulti-Concept Multi-Modality Active Learning for Interactive Video Annotation.
Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), 2007

An Efficient Automatic Video Shot Size Annotation Scheme.
Proceedings of the Advances in Multimedia Modeling, 2007

Video annotation by graph-based learning with neighborhood similarity.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Optimizing multi-graph learning: towards a unified video annotation scheme.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Multi-Graph Semi-Supervised Learning for Video Semantic Feature Extraction.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Lazy Learning Based Efficient Video Annotation.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

An Interactive Video Annotation Frameowrk with Multiple Modalities.
Proceedings of the IEEE International Conference on Acoustics, 2007

Angle of Models Distance as Test Algorithm in Speaker Verification.
Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery, 2007

2006
Efficient semantic annotation method for indexing large personal video database.
Proceedings of the 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2006

Automatic video annotation based on co-adaptation and label correction.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

Enhanced Semi-Supervised Learning for Automatic Video Annotation.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Video Annotation by Active Learning and Semi-Supervised Ensembling.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Semi-Supervised Kernel Regression.
Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), 2006

An Automatic Video Semantic Annotation Scheme Based on Combination of Complementary Predictors.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Video Annotation by Active Learning and Cluster Tuning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006

2005
Semi-automatic video annotation based on active learning with multiple complementary predictors.
Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2005

An Improved Spectral and Prosodic Transformation Method in STRAIGHT-based Voice Conversion.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Sliding Window Smoothing For Maximum Entropy Based Intonational Phrase Prediction In Chinese.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Perceptual Video Streaming by Adaptive Spatial-temporal Scalability.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

Double Gaussian based feature normalization for robust speech recognition.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

MCE-based training of subspace distribution clustering HMM.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

A region based multiple frame-rate tradeoff of video streaming.
Proceedings of the 2004 International Conference on Image Processing, 2004

A complexity reduction of ETSI advanced front-end for DSR.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004


  Loading...