Yanmin Qian

Orcid: 0000-0002-0314-3790

According to our database1, Yanmin Qian authored at least 217 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Universal Cross-Lingual Data Generation for Low Resource ASR.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
CoRR, 2024

Improving Design of Input Condition Invariant Speech Enhancement.
CoRR, 2024

2023
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.
J. Open Source Softw., November, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310).
Dataset, October, 2023

Depth-First Neural Architecture With Attentive Feature Fusion for Efficient Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models.
CoRR, 2023

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction.
CoRR, 2023

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition.
CoRR, 2023

USED: Universal Speaker Extraction and Diarization.
CoRR, 2023

Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer.
CoRR, 2023

InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models.
CoRR, 2023

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.
CoRR, 2023

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition.
CoRR, 2023

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR.
CoRR, 2023

Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor.
CoRR, 2023

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Adaptive Large Margin Fine-Tuning For Robust Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2023

Code-Switching Text Generation and Injection in Mandarin-English ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit Bert for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Wespeaker: A Research and Production Oriented Speaker Embedding Learning Toolkit.
Proceedings of the IEEE International Conference on Acoustics, 2023

Lowbit Neural Network Quantization for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2023

Joint Discriminator and Transfer Based Fast Domain Adaptation For End-To-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

Predictive Skim: Contrastive Predictive Coding for Low-Latency Online Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Target Sound Extraction with Variable Cross-Modality Clues.
Proceedings of the IEEE International Conference on Acoustics, 2023

Robust Audio-Visual ASR with Unified Cross-Modal Attention.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Dino-Based Self-Supervised Speaker Verification with Progressive Cluster-Aware Training.
Proceedings of the IEEE International Conference on Acoustics, 2023

Exploring Binary Classification Loss for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2023

Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
Proceedings of the IEEE International Conference on Acoustics, 2023

Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Toward Universal Speech Enhancement For Diverse Input Conditions.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT For Distortion-Invariant Robust Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Improving Speech Enhancement Using Audio Tagging Knowledge From Pre-Trained Representations and Multi-Task Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Efficient Text-Only Domain Adaptation For CTC-Based ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Optimizing Data Usage for Low-Resource Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
IEEE J. Sel. Top. Signal Process., 2022

Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022.
CoRR, 2022

SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022.
CoRR, 2022

Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition.
CoRR, 2022

The SJTU X-LANCE Lab System for CNSRC 2022.
CoRR, 2022

End-to-End Multi-Speaker ASR with Independent Vector Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Medical Difficult Airway Detection using Speech Technology.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Improving Speech Separation with Knowledge Distilled from Self-supervised Pre-trained Models.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Speaking style compensation on synthetic audio for robust keyword spotting.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification.
Proceedings of the Interspeech 2022, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.
Proceedings of the Interspeech 2022, 2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
Proceedings of the Interspeech 2022, 2022

DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design.
Proceedings of the Interspeech 2022, 2022

Dual Path Embedding Learning for Speaker Verification with Triplet Attention.
Proceedings of the Interspeech 2022, 2022

Attentive Feature Fusion for Robust Speaker Verification.
Proceedings of the Interspeech 2022, 2022

MSDWild: Multi-modal Speaker Diarization Dataset in the Wild.
Proceedings of the Interspeech 2022, 2022

Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction.
Proceedings of the Interspeech 2022, 2022

Knowledge Transfer and Distillation from Autoregressive to Non-Autoregessive Speech Recognition.
Proceedings of the Interspeech 2022, 2022

Exploring Effective Data Utilization for Low-Resource Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Punctuation Prediction for Streaming On-Device Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Time-Domain Audio-Visual Speech Separation on Low Quality Videos.
Proceedings of the IEEE International Conference on Acoustics, 2022

Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2022

The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021.
Proceedings of the IEEE International Conference on Acoustics, 2022

Self-Knowledge Distillation via Feature Enhancement for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022

Skim: Skipping Memory Lstm for Low-Latency Real-Time Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Local Information Modeling with Self-Attention for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022

MLP-SVNET: A Multi-Layer Perceptrons Based Network for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022

Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Modified Magnitude-Phase Spectrum Information for Spoofing Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Audio-Visual Deep Neural Network for Robust Person Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
CoRR, 2021

Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Dual-Path RNN for Long Recording Speech Separation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Data Augmentation for end-to-end Code-Switching Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Speaker Embedding Augmentation with Noise Distribution Matching.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Knowledge Distillation from Multi-Modality to Single-Modality for Person Verification.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

The SJTU System for Short-Duration Speaker Verification Challenge 2021.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend.
Proceedings of the IEEE International Conference on Acoustics, 2021

Towards Data Selection on TTS Data for Children's Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods.
Proceedings of the IEEE International Conference on Acoustics, 2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings.
Proceedings of the IEEE International Conference on Acoustics, 2021

Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2021

SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021

AISpeech-SJTU ASR System for the Accented English Speech Recognition Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Improving End-to-End Single-Channel Multi-Talker Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation.
CoRR, 2020

End-to-End Speaker-Dependent Voice Activity Detection.
CoRR, 2020

End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.
Proceedings of the Interspeech 2020, 2020

Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection.
Proceedings of the Interspeech 2020, 2020

Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts.
Proceedings of the Interspeech 2020, 2020

Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation.
Proceedings of the Interspeech 2020, 2020

Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network.
Proceedings of the Interspeech 2020, 2020

Multi-Modality Matters: A Performance Leap on VoxCeleb.
Proceedings of the Interspeech 2020, 2020

Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Deep Audio-Visual Speech Separation with Attention Mechanism.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Multi-Speaker Speech Recognition With Transformer.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Data augmentation using generative adversarial networks for robust speech recognition.
Speech Commun., 2019

Binary neural networks for speech recognition.
Frontiers Inf. Technol. Electron. Eng., 2019

Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking.
Proceedings of the Interspeech 2019, 2019

Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System.
Proceedings of the Interspeech 2019, 2019

The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge.
Proceedings of the Interspeech 2019, 2019

Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification.
Proceedings of the Interspeech 2019, 2019

On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction.
Proceedings of the Interspeech 2019, 2019

Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training.
Proceedings of the Interspeech 2019, 2019

Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech.
Proceedings of the Interspeech 2019, 2019

Joint Decoding of CTC Based Systems for Speech Recognition.
Proceedings of the Interspeech 2019, 2019

Knowledge Distillation for Small Foot-print Deep Speaker Embedding.
Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Monaural Multi-speaker ASR System without Pretraining.
Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

GANs for Children: A Generative Data Augmentation Strategy for Children Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Exploring Model Units and Training Strategies for End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Single-channel multi-talker speech recognition with permutation invariant training.
Speech Commun., 2018

Sequence discriminative training for deep learning based acoustic keyword spotting.
Speech Commun., 2018

Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2018

Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2018

Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Covariance Based Deep Feature for Text-Dependent Speaker Verification.
Proceedings of the Intelligence Science and Big Data Engineering, 2018

Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures.
Proceedings of the Interspeech 2018, 2018

Knowledge Distillation for Sequence Model.
Proceedings of the Interspeech 2018, 2018

Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation.
Proceedings of the Interspeech 2018, 2018

Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks.
Proceedings of the Interspeech 2018, 2018

Robust Mask Estimation By Integrating Neural Network-Based and Clustering-Based Approaches for Adaptive Acoustic Beamforming.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker Identification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Noise Robust Speech Recognition on Aurora4 by Humans and Machines.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Joint I-Vector with End-to-End System for Short Duration Text-Independent Speaker Verification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Generative Adversarial Networks Based Data Augmentation for Noise Robust Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Fast Adaptation on Deepmixture Generative Network Based Acoustic Modeling.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Adaptive Permutation Invariant Training with Auxiliary Information for Monaural Multi-Talker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Knowledge Transfer in Permutation Invariant Training for Single-Channel Multi-Talker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Deep Feature Engineering for Noise Robust Spoofing Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Phone Synchronous Speech Recognition With CTC Lattices.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Unified Confidence Measure Framework Using Auxiliary Normalization Graph.
Proceedings of the Intelligence Science and Big Data Engineering, 2017

Recognizing Multi-Talker Speech with Permutation Invariant Training.
Proceedings of the Interspeech 2017, 2017

Binary Deep Neural Networks for Speech Recognition.
Proceedings of the Interspeech 2017, 2017

What Does the Speaker Embedding Encode?
Proceedings of the Interspeech 2017, 2017

Small-footprint convolutional neural network for spoofing detection.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

End-to-end spoofing detection with raw waveform CLDNNS.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR.
Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2017

Future vector enhanced LSTM language model for LVCSR.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Integrating online i-vector into GMM-UBM for text-dependent speaker verification.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
Cluster Adaptive Training for Deep Neural Network Based Acoustic Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Deep features for automatic spoofing detection.
Speech Commun., 2016

Very deep convolutional neural networks for robust speech recognition.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Multi-task joint-learning for robust voice activity detection.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC.
Proceedings of the Interspeech 2016, 2016

Improved DNN-based segmentation for multi-genre broadcast audio.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Speaker-aware training of LSTM-RNNS for acoustic modelling.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Integrated adaptation with multi-factor joint-learning for far-field speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

An investigation into using parallel data for far-field speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Joint acoustic factor learning for robust deep neural network based automatic speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016


2015
Deep feature for text-dependent speaker verification.
Speech Commun., 2015

Paragraph vector based topic model for language model adaptation.
Proceedings of the INTERSPEECH 2015, 2015

Multi-task learning for text-dependent speaker verification.
Proceedings of the INTERSPEECH 2015, 2015

Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge.
Proceedings of the INTERSPEECH 2015, 2015

Very deep convolutional neural networks for LVCSR.
Proceedings of the INTERSPEECH 2015, 2015

Automatic model redundancy reduction for fast back-propagation for deep neural networks in speech recognition.
Proceedings of the 2015 International Joint Conference on Neural Networks, 2015

Cluster adaptive training for deep neural network.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Recurrent neural network language model with structured word embeddings for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A novel static parameter calculation method for model compensation.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Local trajectory based speech enhancement for robust speech recognition with deep neural network.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

An investigation on DNN-derived bottleneck features for GMM-HMM based robust speech recognition.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Cambridge university transcription systems for the multi-genre broadcast challenge.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Multi-task joint-learning of deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The development of the cambridge university alignment systems for the multi-genre broadcast challenge.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Speaker diarisation and longitudinal linking in multi-genre broadcast data.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Acoustic emotion recognition using deep neural network.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Tandem deep features for text-dependent speaker verification.
Proceedings of the INTERSPEECH 2014, 2014

A novel dynamic parameters calculation approach for model compensation.
Proceedings of the INTERSPEECH 2014, 2014

Speaker verification with deep features.
Proceedings of the 2014 International Joint Conference on Neural Networks, 2014

Reshaping deep neural network for fast decoding by node-pruning.
Proceedings of the IEEE International Conference on Acoustics, 2014

Stochastic data sweeping for fast DNN training.
Proceedings of the IEEE International Conference on Acoustics, 2014

Second order vector taylor series based robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
MLP-HMM two-stage unsupervised training for low-resource languages on conversational telephone speech recognition.
Proceedings of the INTERSPEECH 2013, 2013

Combination of data borrowing strategies for low-resource LVCSR.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
Articulatory Feature based Multilingual MLPs for Low-Resource Speech Recognition.
Proceedings of the INTERSPEECH 2012, 2012

Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition.
Proceedings of the INTERSPEECH 2012, 2012

Generating exact lattices in the WFST framework.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Time-Frequency Cepstral Features and Combining Discriminative Training for Phonotactic Language Recognition.
J. Comput., 2011

Language Recognition Based on Acoustic Diversified Phone Recognizers and Phonotactic Feature Fusion.
IEICE Trans. Inf. Syst., 2011

State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs.
Proceedings of the INTERSPEECH 2011, 2011

Strategies for using MLP based features with limited target-language training data.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
Mandarin-English bilingual phone modeling and combining MPE based Discriminative training for cross-language speech recognition.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Integration of Complementary Phone Recognizers for Phonotactic Language Recognition.
Proceedings of the Information Computing and Applications - First International Conference, 2010

Phone modeling and combining discriminative training for mandarinenglish bilingual speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Efficient embedded speech recognition for very large vocabulary Mandarin car-navigation systems.
IEEE Trans. Consumer Electron., 2009


  Loading...