Xugang Lu

Proceedings of the 30th European Signal Processing Conference, 2022

2021

Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Integrating a joint Bayesian generative model in a discriminative learning framework for speaker verification.

[BibT_eX]

[DOI]

CoRR, 2021

Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

Improving Perceptual Quality by Phone-Fortified Perceptual Loss Using Wasserstein Distance for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

A Study of Incorporating Articulatory Movement Information in Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 29th European Signal Processing Conference, 2021

Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Automatic Speech Recognition.

[BibT_eX]

[DOI]

Sheng Li

Masakiyo Fujimoto

Proceedings of the Speech-to-Speech Translation, 2020

Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders.

[BibT_eX]

[DOI]

Cheng Yu

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2020

Improving Perceptual Quality by Phone-Fortified Perceptual Loss for Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2020

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders.

[BibT_eX]

[DOI]

Cheng Yu

CoRR, 2020

Compensation on x-vector for Short Utterance Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Joint Training End-to-End Speech Recognition Systems with Speaker Attributes.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020.

[BibT_eX]

[DOI]

Peng Shen

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Incorporating Broad Phonetic Information for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Robust Unsupervised Neural Machine Translation with Adversarial Denoising Training.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Computational Linguistics, 2020

2019

Deep progressive multi-scale attention for acoustic event classification.

[BibT_eX]

[DOI]

CoRR, 2019

Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement.

[BibT_eX]

[DOI]

Natalie Yu-Hsien Wang

Hsiao-Lan Sharon Wang

CoRR, 2019

Optimal Classifier Parameter Status Selection Based on Bayes Boundary-ness for Multi-ProtoType and Multi-Layer Perceptron Classifiers.

[BibT_eX]

[DOI]

Proceedings of the Integrated Uncertainty in Knowledge Modelling and Decision Making, 2019

Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Incorporating Symbolic Sequential Modeling for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Study of articulators' contribution and compensation during speech by articulatory speech recognition.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2018

Speech Dereverberation Based on Integrated Deep and Ensemble Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Improving Very Deep Time-Delay Neural Network With Vertical-Attention For Effectively Training CTC-Based ASR Systems.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Temporal Attentive Pooling for Acoustic Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Denoising Autoencoder Based Post Filtering for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

A Deep Denoising Autoencoder Approach to Improving the Intelligibility of Vocoded Speech in Cochlear Implant Simulation.

[BibT_eX]

[DOI]

IEEE Trans. Biomed. Eng., 2017

Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models.

[BibT_eX]

[DOI]

Naoyuki Kanda

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Method of Estimating Signal-to-Noise Ratio Based on Optimal Design for Sub-band Voice Activity Detection.

[BibT_eX]

[DOI]

J. Inf. Hiding Multim. Signal Process., 2017

Regularization of neural network model with distance metric learning for i-vector based spoken language identification.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2017

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2017

Multi-Metrics Learning for Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2017

Complex spectrogram enhancement by convolutional neural network with multi-metrics learning.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Conditional Generative Adversarial Nets Classifier for Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Semi-supervised ensemble DNN acoustic model training.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework.

[BibT_eX]

[DOI]

Naoyuki Kanda

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Incremental training and constructing the very deep convolutional residual network acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Raw waveform-based speech enhancement by fully convolutional networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2016

Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription.

[BibT_eX]

[DOI]

Speech Commun., 2016

Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2016

Automatic acoustic segmentation in N-best list rescoring for lecture speech recognition.

[BibT_eX]

[DOI]

Peng Shen

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Comparison of regularization constraints in deep neural network based speaker adaptation.

[BibT_eX]

[DOI]

Peng Shen

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

A pseudo-task design in multi-task learning deep neural network for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Confidence estimation for speech recognition systems using conditional random fields trained with partially annotated data.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Incorporating local environment information with ensemble neural networks to robust automatic speech recognition.

[BibT_eX]

[DOI]

Chia-Yung Hsu

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

F<sub>0</sub> Contour Analysis Based on Empirical Mode Decomposition for DNN Acoustic Modeling in Mandarin Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Maximum a posteriori Based Decoding for CTC Acoustic Models.

[BibT_eX]

[DOI]

Naoyuki Kanda

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement.

[BibT_eX]

[DOI]

Szu-Wei Fu

Yu Tsao

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Local fisher discriminant analysis for spoken language identification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Bottleneck linear transformation network adaptation for speaker adaptive training-based hybrid DNN-HMM speech recognizer.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Ensemble environment modeling using affine transform group.

[BibT_eX]

[DOI]

Speech Commun., 2015

Sparse representation with temporal max-smoothing for acoustic event detection.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speaker adaptive training for deep neural networks embedding linear transformation networks.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014

Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2014

Robust voice activity detection based on concept of modulation transfer function in noisy reverberant environments.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Signal to noise ratio estimation based on an optimal design of subband voice activity detection.

[BibT_eX]

[DOI]

Shota Morita

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Spectral patch based sparse coding for acoustic event detection.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Mandarin speech recognition using convolution neural network with augmented tone features.

[BibT_eX]

[DOI]

Xinhui Hu

Chiori Hori

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Ensemble modeling of denoising autoencoder for speech spectrum restoration.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Speaker Adaptive Training using Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Sparse representation based on a bag of spectral exemplars for acoustic event detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Speech enhancement using segmental nonnegative matrix factorization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Controlling Tradeoff Between Approximation Accuracy and Complexity of a Smooth Function in a Reproducing Kernel Hilbert Space for Noise Reduction.

[BibT_eX]

[DOI]

IEEE Trans. Signal Process., 2013

The NICT ASR system for IWSLT 2013.

[BibT_eX]

[DOI]

Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2013, 2013

Speech enhancement based on deep denoising autoencoder.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Speech spectrum restoration based on conditional restricted boltzmann machine.

[BibT_eX]

[DOI]

Shigeki Matsuda

Chiori Hori

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Automatic localization of a language-independent sub-network on deep neural networks trained by multi-lingual speech.

[BibT_eX]

[DOI]

Shigeki Matsuda

Hideki Kashioka

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

The NICT ASR system for IWSLT2012.

[BibT_eX]

[DOI]

Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

Factored recurrent neural network language model in TED lecture transcription.

[BibT_eX]

[DOI]

Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

Unified denoising and dereverberation method used in restoration of MTF-based power envelope.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Controlling the tradeoff property in a regularization framework for noise reduction.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Acoustic space partition based on broad phonetic class for ensemble acoustic modeling.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Speech restoration based on deep learning autoencoder with layer-wised pretraining.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Noise estimation using a constrained sequential HMM IN log-spectral domain.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Factored Language Model based on Recurrent Neural Network.

[BibT_eX]

[DOI]

Proceedings of the COLING 2012, 2012

2011

Temporal modulation normalization for robust speech feature extraction and recognition.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2011

Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments.

[BibT_eX]

[DOI]

Satoshi Nakamura

Comput. Speech Lang., 2011

Voice Activity Detection in MTF-Based Power Envelope Restoration.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Adaptive Regularization Framework for Robust Voice Activity Detection.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010

Vowel Production Manifold: Intrinsic Factor Analysis of Vowel Articulation.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2010

Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2010

Speech enhancement as a functional approximation and generalization.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Voice activity detection in a reguarized reproducing kernel hilbert space.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009

Speech Enhancement Based on Noise Eigenspace Projection.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2009

Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments.

[BibT_eX]

[DOI]

Satoshi Nakamura

Proceedings of the 3rd International Universal Communication Symposium, 2009

Subband temporal modulation spectrum normalization for automatic speech recognition in reverberant environments.

[BibT_eX]

[DOI]

Satoshi Nakamura

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Temporal contrast normalization and edge-preserved smoothing on temporal modulation structure for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification.

[BibT_eX]

[DOI]

Speech Commun., 2008

Normalization on Temporal Modulation Transfer Function for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the ISUC 2008, 2008

Noise Reduction Based Random Matrix Theory.

[BibT_eX]

[DOI]

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Robust front end processing for speech recognition in reverberant environments: utilization of speech characteristics.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A model based investigation of activation patterns of the tongue muscles for vowel production.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007

A Model-Based Learning Process for Modeling Coarticulation of Human Speech.

[BibT_eX]

[DOI]

Jianguo Wei

IEICE Trans. Inf. Syst., 2007

Dimension reduction for speaker identification based on mutual information.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Physiological Feature Extraction for Text Independent Speaker Identification using Non-Uniform Subband Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

A Robust Voice Activity Detection Based on Noise Eigenspace Projection.

[BibT_eX]

[DOI]

Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Auditory Contrast Spectrum for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

A simulation based parameter optimization for a coarticulation model.

[BibT_eX]

[DOI]

Jianguo Wei

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

A robust feature extraction based on the MTF concept for speech recognition in reverberant environment.

[BibT_eX]

[DOI]

Masato Akagi

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005

A noise reduction system in arbitrary noise environments and its applications to speech enhancement and speech recognition.

[BibT_eX]

[DOI]

Junfeng Li

Masato Akagi

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2000

Dominant subspace analysis for auditory spectrum.

[BibT_eX]

[DOI]

Gang Li

Lipo Wang

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999

Nonlinear processing in auditory system.

[BibT_eX]

Daowen Chen

Proceedings of the IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP'99), 1999

A New Cochlear Model Based on Adaptive Gain Mechanism.

[BibT_eX]

[DOI]

Daowen Chen

Proceedings of the Foundations and Tools for Neural Modeling, 1999

Integrating spatial and temporal mechanisms in auditory neural fiber's computational model.

[BibT_eX]

[DOI]