We stand with Ukraine

We stand with Ukraine

Yanmin Qian

Orcid: 0000-0002-0314-3790

According to our database¹, Yanmin Qian authored at least 309 papers between 2009 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions.

[DOI]

,

,

,

,

CoRR, May, 2026

On the Distillation Loss Functions of Speech VAE for Unified Reconstruction, Understanding, and Generation.

[DOI]

,

,

,

,

,

,

CoRR, April, 2026

Localizing Speech Deepfakes Beyond Transitions via Segment-Aware Learning.

[DOI]

,

,

CoRR, January, 2026

Representation-Regularized Convolutional Audio Transformer for Audio Understanding.

[DOI]

,

,

,

,

,

,

CoRR, January, 2026

SLM-SS: Speech Language Model for Generative Speech Separation.

[DOI]

,

,

,

,

,

,

CoRR, January, 2026

UrgentMOS: Unified Multi-Metric and Preference Learning for Robust Speech Quality Assessment.

[DOI]

,

,

,

,

Samuele Cornell

,

,

,

,

,

,

,

,

Tim Fingscheidt

,

Shinji Watanabe

,

CoRR, January, 2026

DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice.

[DOI]

,

,

,

,

CoRR, January, 2026

ICASSP 2026 URGENT Speech Enhancement Challenge.

[DOI]

,

,

,

,

,

Samuele Cornell

,

,

,

Tim Fingscheidt

,

Shinji Watanabe

,

CoRR, January, 2026

An end-to-end integration of speech separation and recognition with self-supervised learning representation.

[DOI]

Yoshiki Masuyama

,

,

,

Samuele Cornell

,

,

,

,

Shinji Watanabe

Comput. Speech Lang., 2026

USE: A Unified Model for Universal Sound Separation and Extraction.

[DOI]

,

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

A Data-Centric Approach to Generalizable Speech Deepfake Detection.

[DOI]

,

,

CoRR, December, 2025

Training Text-to-Speech Model with Purely Synthetic Data: Feasibility, Sensitivity, and Generalization Capability.

[DOI]

,

,

,

CoRR, December, 2025

SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation.

[DOI]

,

,

,

,

CoRR, September, 2025

Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection.

[DOI]

,

,

,

Wei-Qiang Zhang

,

,

,

CoRR, August, 2025

FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation.

[DOI]

,

,

,

,

,

,

,

,

Wei-Qiang Zhang

,

,

,

,

CoRR, July, 2025

Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment.

[DOI]

,

,

,

,

Shinji Watanabe

,

CoRR, June, 2025

DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration.

[DOI]

,

,

,

,

,

CoRR, May, 2025

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM.

[DOI]

,

,

,

,

CoRR, May, 2025

CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching.

[DOI]

,

,

,

Manthan Thakker

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Lightweight Front-end Enhancement for Robust ASR via Frame Resampling and Sub-Band Pruning.

[DOI]

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Lessons Learned from the URGENT 2024 Speech Enhancement Challenge.

[DOI]

,

,

Samuele Cornell

,

Robin Scheibler

,

,

,

,

,

,

,

Shinji Watanabe

,

Tim Fingscheidt

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition.

[DOI]

,

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

E2E-BPVC: End-to-End Background-Preserving Voice Conversion via In-Context Learning.

[DOI]

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Efficient Multilingual ASR Finetuning via LoRA Language Experts.

[DOI]

,

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation.

[DOI]

,

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

From Sharpness to Better Generalization for Speech Deepfake Detection.

[DOI]

,

,

,

Junichi Yamagishi

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Ranking and Selection of Bias Words for Contextual Bias Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Ultra-Low Bit Post-Training Quantization of Large Speech Models via K-Means Clustering and Mixed Precision Allocation.

[DOI]

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM.

[DOI]

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

A New Perspective on Speaker Verification: Joint Modeling with DFSMN and Transformer.

[DOI]

,

,

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2025

Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction.

[DOI]

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation.

[DOI]

,

,

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning.

[DOI]

,

,

,

,

,

,

,

,

Wei-Qiang Zhang

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Efficient Pruning for Large-Scale Seq2Seq Speech Models without Back-Propagation.

[DOI]

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching for Speaker Diarization.

[DOI]

,

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Advancing Non-intrusive Suppression on Enhancement Distortion for Noise Robust ASR.

[DOI]

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation.

[DOI]

,

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Advancing Controllable Music Generation with Latent Rectified Flow Guided by Rhythm and Harmony.

[DOI]

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment.

[DOI]

,

,

,

,

Shinji Watanabe

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition.

[DOI]

,

,

,

,

Samuele Cornell

,

,

Robin Scheibler

,

,

,

,

,

Tim Fingscheidt

,

Shinji Watanabe

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

OOQ: Outlier-Oriented Quantization for Efficient Large Language Models.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Less is More: Data Curation Matters in Scaling Speech Enhancement.

[DOI]

,

,

,

Robin Scheibler

,

,

Samuele Cornell

,

,

,

,

,

Tim Fingscheidt

,

Shinji Watanabe

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods.

[DOI]

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition [Special Issue On Model-Based and Data-Driven Audio Signal Processing].

[DOI]

,

Shinji Watanabe

,

,

,

,

IEEE Signal Process. Mag., November, 2024

Universal Cross-Lingual Data Generation for Low Resource ASR.

[DOI]

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning.

[DOI]

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization.

[DOI]

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Unified Cross-Modal Attention: Robust Audio-Visual Speech Recognition and Beyond.

[DOI]

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification.

[DOI]

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Advanced Long-Content Speech Recognition With Factorized Neural Transducer.

[DOI]

,

,

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer.

[DOI]

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Advancing speaker embedding learning: Wespeaker toolkit for research and production.

[DOI]

,

,

,

,

Chengdong Liang

,

,

,

,

,

,

,

Speech Commun., 2024

Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling.

[DOI]

,

,

,

CoRR, 2024

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification.

[DOI]

,

CoRR, 2024

Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning.

[DOI]

,

,

,

,

,

,

,

,

Wei-Qiang Zhang

,

CoRR, 2024

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching.

[DOI]

,

,

,

,

CoRR, 2024

Target Speech Diarization with Multimodal Prompts.

[DOI]

,

,

,

,

CoRR, 2024

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.

[DOI]

,

Robin Scheibler

,

,

Samuele Cornell

,

,

,

,

,

,

Shinji Watanabe

,

Tim Fingscheidt

,

CoRR, 2024

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs.

[DOI]

,

,

,

,

,

,

CoRR, 2024

GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting.

[DOI]

,

,

,

,

,

,

CoRR, 2024

Improving Design of Input Condition Invariant Speech Enhancement.

[DOI]

,

,

Shinji Watanabe

,

CoRR, 2024

Improving Anomalous Sound Detection Via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models.

[DOI]

,

,

,

,

,

,

Wei-Qiang Zhang

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

DDTSE: Discriminative Diffusion Model for Target Speech Extraction.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Enhancing Speaker Extraction Through Rectifying Target Confusion.

[DOI]

,

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Diffusion-Based Generative Modeling With Discriminative Guidance for Streamable Speech Enhancement.

[DOI]

,

Samuele Cornell

,

Shinji Watanabe

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Disentangling The Prosody And Semantic Information With Pre-Trained Model For In-Context Learning Based Zero-Shot Voice Conversion.

[DOI]

,

,

,

,

Junichi Yamagishi

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Knowledge Distillation from Discriminative Model to Generative Model with Parallel Architecture for Speech Enhancement.

[DOI]

,

,

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Insights from Hyperparameter Scaling of Online Speech Separation.

[DOI]

,

,

,

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Band-Wise Front-End Distortion Suppression for Robust Speech Recognition.

[DOI]

,

,

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification.

[DOI]

,

,

,

,

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

ConMamba: A Convolution-Augmented Mamba Encoder Model for Efficient End-to-End ASR Systems.

[DOI]

,

,

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Combining Self-Supervised Learning and Adversarial Training Based Domain Adaptation for Speaker Verification.

[DOI]

,

,

,

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.

[DOI]

,

Robin Scheibler

,

,

Samuele Cornell

,

,

,

,

,

Shinji Watanabe

,

Tim Fingscheidt

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement.

[DOI]

,

,

,

,

Shinji Watanabe

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection.

[DOI]

,

,

,

,

Wei-Qiang Zhang

,

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SparseWAV: Fast and Accurate One-Shot Unstructured Pruning for Large Speech Foundation Models.

[DOI]

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems.

[DOI]

,

,

,

Junichi Yamagishi

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Contextual Biasing Speech Recognition in Speech-enhanced Large Language Model.

[DOI]

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

InstructME: An Instruction Guided Music Edit Framework with Latent Diffusion Models.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Improving Acoustic Scene Classification via Self-Supervised and Semi-Supervised Learning with Efficient Audio Transformer.

[DOI]

,

,

,

,

,

,

,

,

,

Wei-Qiang Zhang

,

,

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Semi-Supervised Acoustic Scene Classification with Test-Time Adaptation.

[DOI]

,

,

,

,

,

,

,

,

Wei-Qiang Zhang

,

,

,

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Improving Design of Input Condition Invariant Speech Enhancement.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Generation-Based Target Speech Extraction with Speech Discretization and Vocoder.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Prompt-Driven Target Speech Diarization.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Robust Cross-Domain Speaker Verification with Multi-Level Domain Adapters.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection.

[DOI]

,

,

,

,

,

,

,

,

Wei-Qiang Zhang

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.

[DOI]

,

,

,

,

Samuele Cornell

,

,

Yoshiki Masuyama

,

,

Robin Scheibler

,

,

,

,

Shinji Watanabe

J. Open Source Softw., November, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310).

[DOI]

,

,

,

,

Samuele Cornell

,

,

Yoshiki Masuyama

,

,

Robin Scheibler

,

,

,

,

Shinji Watanabe

Dataset, October, 2023

Depth-First Neural Architecture With Attentive Feature Fusion for Efficient Speaker Verification.

[DOI]

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2023

USED: Universal Speaker Extraction and Diarization.

[DOI]

,

Mehmet Sinan Yildirim

,

,

,

,

,

,

,

CoRR, 2023

InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2023

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR.

[DOI]

,

,

,

,

,

CoRR, 2023

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation.

[DOI]

Yoshiki Masuyama

,

,

,

Samuele Cornell

,

,

,

,

Shinji Watanabe

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition.

[DOI]

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition.

[DOI]

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Overlap Aware Continuous Speech Separation without Permutation Invariant Training.

[DOI]

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR.

[DOI]

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adaptive Neural Network Quantization For Lightweight Speaker Verification.

[DOI]

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Text Only Domain Adaptation with Phoneme Guided Data Splicing for End-to-End Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory.

[DOI]

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ECAPA++: Fine-grained Deep Embedding Learning for TDNN Based Speaker Verification.

[DOI]

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Reversible Neural Networks for Memory-Efficient Speaker Verification.

[DOI]

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.

[DOI]

,

,

,

,

,

Takuya Yoshioka

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022.

[DOI]

,

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor.

[DOI]

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adaptive Large Margin Fine-Tuning For Robust Speaker Verification.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Code-Switching Text Generation and Injection in Mandarin-English ASR.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech Separation.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit Bert for Robust Speech Recognition.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Wespeaker: A Research and Production Oriented Speaker Embedding Learning Toolkit.

[DOI]

,

Chengdong Liang

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Lowbit Neural Network Quantization for Speaker Verification.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Joint Discriminator and Transfer Based Fast Domain Adaptation For End-To-End Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Predictive Skim: Contrastive Predictive Coding for Low-Latency Online Speech Separation.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Target Sound Extraction with Variable Cross-Modality Clues.

[DOI]

,

,

,

,

Takuya Yoshioka

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Robust Audio-Visual ASR with Unified Cross-Modal Attention.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Dino-Based Self-Supervised Speaker Verification with Progressive Cluster-Aware Training.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Exploring Binary Classification Loss for Speaker Verification.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing.

[DOI]

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Toward Universal Speech Enhancement For Diverse Input Conditions.

[DOI]

,

,

,

Shinji Watanabe

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT For Distortion-Invariant Robust Speech Recognition.

[DOI]

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Improving Speech Enhancement Using Audio Tagging Knowledge From Pre-Trained Representations and Multi-Task Learning.

[DOI]

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Efficient Text-Only Domain Adaptation For CTC-Based ASR.

[DOI]

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party.

[DOI]

,

,

Christoph Böddeker

,

Tomohiro Nakatani

,

Shinji Watanabe

,

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Optimizing Data Usage for Low-Resource Speech Recognition.

[DOI]

,

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition.

[DOI]

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation.

[DOI]

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2022

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.

[DOI]

,

,

,

,

,

,

,

,

Takuya Yoshioka

,

,

,

,

,

,

,

,

,

,

IEEE J. Sel. Top. Signal Process., 2022

SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022.

[DOI]

,

,

,

,

,

CoRR, 2022

Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition.

[DOI]

,

,

CoRR, 2022

The SJTU X-LANCE Lab System for CNSRC 2022.

[DOI]

,

,

,

,

CoRR, 2022

End-to-End Multi-Speaker ASR with Independent Vector Analysis.

[DOI]

Robin Scheibler

,

,

,

Shinji Watanabe

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning.

[DOI]

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Medical Difficult Airway Detection using Speech Technology.

[DOI]

,

,

,

,

,

,

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition.

[DOI]

,

,

,

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Improving Speech Separation with Knowledge Distilled from Self-supervised Pre-trained Models.

[DOI]

,

,

,

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022.

[DOI]

,

,

,

,

,

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Speaking style compensation on synthetic audio for robust keyword spotting.

[DOI]

,

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification.

[DOI]

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.

[DOI]

,

,

,

,

,

Sefik Emre Eskimez

,

Takuya Yoshioka

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.

[DOI]

,

,

,

,

Samuele Cornell

,

,

Yoshiki Masuyama

,

,

Robin Scheibler

,

,

,

,

Shinji Watanabe

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design.

[DOI]

,

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Dual Path Embedding Learning for Speaker Verification with Triplet Attention.

[DOI]

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Attentive Feature Fusion for Robust Speaker Verification.

[DOI]

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MSDWild: Multi-modal Speaker Diarization Dataset in the Wild.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction.

[DOI]

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Knowledge Transfer and Distillation from Autoregressive to Non-Autoregessive Speech Recognition.

[DOI]

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Exploring Effective Data Utilization for Low-Resource Speech Recognition.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Punctuation Prediction for Streaming On-Device Speech Recognition.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Time-Domain Audio-Visual Speech Separation on Low Quality Videos.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Self-Knowledge Distillation via Feature Enhancement for Speaker Verification.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Skim: Skipping Memory Lstm for Low-Latency Real-Time Continuous Speech Separation.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Local Information Modeling with Self-Attention for Speaker Verification.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

MLP-SVNET: A Multi-Layer Perceptrons Based Network for Speaker Verification.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Modified Magnitude-Phase Spectrum Information for Spoofing Detection.

[DOI]

,

,

Rohan Kumar Das

,

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Audio-Visual Deep Neural Network for Robust Person Verification.

[DOI]

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.

[DOI]

,

,

,

,

,

,

,

,

Takuya Yoshioka

,

,

,

,

,

,

,

,

,

CoRR, 2021

Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions.

[DOI]

,

,

,

Shinji Watanabe

,

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Dual-Path RNN for Long Recording Speech Separation.

[DOI]

,

,

,

,

Takuya Yoshioka

,

,

,

Keisuke Kinoshita

,

Christoph Böddeker

,

,

Shinji Watanabe

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Data Augmentation for end-to-end Code-Switching Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning.

[DOI]

,

,

,

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Speaker Embedding Augmentation with Noise Distribution Matching.

[DOI]

,

,

,

,

,

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Knowledge Distillation from Multi-Modality to Single-Modality for Person Verification.

[DOI]

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party.

[DOI]

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition.

[DOI]

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The SJTU System for Short-Duration Speaker Verification Challenge 2021.

[DOI]

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition.

[DOI]

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend.

[DOI]

,

Christoph Böddeker

,

Shinji Watanabe

,

Tomohiro Nakatani

,

,

Keisuke Kinoshita

,

,

,

Reinhold Haeb-Umbach

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Towards Data Selection on TTS Data for Children's Speech Recognition.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings.

[DOI]

,

,

,

,

,

Keisuke Kinoshita

,

,

Shinji Watanabe

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.

[DOI]

Christoph Böddeker

,

,

Tomohiro Nakatani

,

Keisuke Kinoshita

,

,

,

,

,

Reinhold Haeb-Umbach

Proceedings of the IEEE International Conference on Acoustics, 2021

AISpeech-SJTU ASR System for the Accented English Speech Recognition Challenge.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Improving End-to-End Single-Channel Multi-Talker Speech Recognition.

[DOI]

,

,

,

Shinji Watanabe

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.

[DOI]

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation.

[DOI]

Christoph Böddeker

,

,

Tomohiro Nakatani

,

Keisuke Kinoshita

,

,

,

,

,

Shinji Watanabe

,

Reinhold Haeb-Umbach

CoRR, 2020

End-to-End Speaker-Dependent Voice Activity Detection.

[DOI]

,

,

,

CoRR, 2020

End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.

[DOI]

,

Aswin Shanmugam Subramanian

,

,

Shinji Watanabe

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition.

[DOI]

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection.

[DOI]

,

Heinrich Dinkel

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts.

[DOI]

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation.

[DOI]

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network.

[DOI]

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multi-Modality Matters: A Performance Leap on VoxCeleb.

[DOI]

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings.

[DOI]

,

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Deep Audio-Visual Speech Separation with Attention Mechanism.

[DOI]

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training.

[DOI]

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Multi-Speaker Speech Recognition With Transformer.

[DOI]

,

,

,

Jonathan Le Roux

,

Shinji Watanabe

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification.

[DOI]

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Data augmentation using generative adversarial networks for robust speech recognition.

[DOI]

,

,

Speech Commun., 2019

Binary neural networks for speech recognition.

[DOI]

,

Frontiers Inf. Technol. Electron. Eng., 2019

Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking.

[DOI]

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System.

[DOI]

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge.

[DOI]

,

,

Heinrich Dinkel

,

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification.

[DOI]

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction.

[DOI]

,

,

,

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training.

[DOI]

,

Heinrich Dinkel

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech.

[DOI]

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Joint Decoding of CTC Based Systems for Speech Recognition.

[DOI]

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Knowledge Distillation for Small Foot-print Deep Speaker Embedding.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Monaural Multi-speaker ASR System without Pretraining.

[DOI]

,

,

,

Shinji Watanabe

Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform.

[DOI]

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

GANs for Children: A Generative Data Augmentation Strategy for Children Speech Recognition.

[DOI]

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Exploring Model Units and Training Strategies for End-to-End Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition.

[DOI]

,

,

,

Jonathan Le Roux

,

Shinji Watanabe

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition.

[DOI]

,

,

,

,

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition.

[DOI]

,

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection.

[DOI]

Heinrich Dinkel

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Single-channel multi-talker speech recognition with permutation invariant training.

[DOI]

,

,

Speech Commun., 2018

Sequence discriminative training for deep learning based acoustic keyword spotting.

[DOI]

,

,

Speech Commun., 2018

Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem.

[DOI]

,

,

,

,

Frontiers Inf. Technol. Electron. Eng., 2018

Past review, current progress, and challenges ahead on the cocktail party problem.

[DOI]

,

,

,

,

Frontiers Inf. Technol. Electron. Eng., 2018

Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification.

[DOI]

,

,

,

,

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition.

[DOI]

,

,

,

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Covariance Based Deep Feature for Text-Dependent Speaker Verification.

[DOI]

,

Heinrich Dinkel

,

,

Proceedings of the Intelligence Science and Big Data Engineering, 2018

Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures.

[DOI]

,

,

,

,

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Knowledge Distillation for Sequence Model.

[DOI]

,

,

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation.

[DOI]

,

,

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks.

[DOI]

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Robust Mask Estimation By Integrating Neural Network-Based and Clustering-Based Approaches for Adaptive Acoustic Beamforming.

[DOI]

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker Identification.

[DOI]

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Noise Robust Speech Recognition on Aurora4 by Humans and Machines.

[DOI]

,

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Joint I-Vector with End-to-End System for Short Duration Text-Independent Speaker Verification.

[DOI]

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Generative Adversarial Networks Based Data Augmentation for Noise Robust Speech Recognition.

[DOI]

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Fast Adaptation on Deepmixture Generative Network Based Acoustic Modeling.

[DOI]

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Adaptive Permutation Invariant Training with Auxiliary Information for Monaural Multi-Talker Speech Recognition.

[DOI]

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Knowledge Transfer in Permutation Invariant Training for Single-Channel Multi-Talker Speech Recognition.

[DOI]

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Deep Feature Engineering for Noise Robust Spoofing Detection.

[DOI]

,

,

Heinrich Dinkel

,

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Phone Synchronous Speech Recognition With CTC Lattices.

[DOI]

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Unified Confidence Measure Framework Using Auxiliary Normalization Graph.

[DOI]

,

,

Proceedings of the Intelligence Science and Big Data Engineering, 2017

Recognizing Multi-Talker Speech with Permutation Invariant Training.

[DOI]

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Binary Deep Neural Networks for Speech Recognition.

[DOI]

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

What Does the Speaker Embedding Encode?

[DOI]

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Small-footprint convolutional neural network for spoofing detection.

[DOI]

Heinrich Dinkel

,

,

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

End-to-end spoofing detection with raw waveform CLDNNS.

[DOI]

Heinrich Dinkel

,

,

,

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR.

[DOI]

,

,

,

,

Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2017

Future vector enhanced LSTM language model for LVCSR.

[DOI]

,

,

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Integrating online i-vector into GMM-UBM for text-dependent speaker verification.

[DOI]

,

,

,

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition.

[DOI]

,

,

,

Lahiru Samarakoon

,

,

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Cluster Adaptive Training for Deep Neural Network Based Acoustic Model.

[DOI]

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition.

[DOI]

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition.

[DOI]

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Deep features for automatic spoofing detection.

[DOI]

,

,

Speech Commun., 2016

Very deep convolutional neural networks for robust speech recognition.

[DOI]

,

Philip C. Woodland

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Multi-task joint-learning for robust voice activity detection.

[DOI]

,

,

,

,

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC.

[DOI]

,

,

,

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Improved DNN-based segmentation for multi-genre broadcast audio.

[DOI]

,

,

Philip C. Woodland

,

Mark J. F. Gales

,

Panagiota Karanasou

,

Pierre Lanchantin

,

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Speaker-aware training of LSTM-RNNS for acoustic modelling.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Integrated adaptation with multi-factor joint-learning for far-field speech recognition.

[DOI]

,

,

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

An investigation into using parallel data for far-field speech recognition.

[DOI]

,

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Joint acoustic factor learning for robust deep neural network based automatic speech recognition.

[DOI]

,

,

,

,

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Overview of BTAS 2016 speaker anti-spoofing competition.

[DOI]

Pavel Korshunov

,

Sébastien Marcel

,

Hannah Muckenhirn

,

André R. Gonçalves

,

A. G. Souza Mello

,

Ricardo Paranhos Velloso Violato

,

Flávio Olmos Simões

,

Mário Uliani Neto

,

Marcus de Assis Angeloni

,

José Augusto Stuchi

,

Heinrich Dinkel

,

,

,

,

,

Proceedings of the 8th IEEE International Conference on Biometrics Theory, 2016

2015

Deep feature for text-dependent speaker verification.

[DOI]

,

,

,

,

,

Speech Commun., 2015

Paragraph vector based topic model for language model adaptation.

[DOI]

,

,

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multi-task learning for text-dependent speaker verification.

[DOI]

,

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge.

[DOI]

,

,

Heinrich Dinkel

,

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Very deep convolutional neural networks for LVCSR.

[DOI]

,

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Automatic model redundancy reduction for fast back-propagation for deep neural networks in speech recognition.

[DOI]

,

,

,

Proceedings of the 2015 International Joint Conference on Neural Networks, 2015

Cluster adaptive training for deep neural network.

[DOI]

,

,

,

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Recurrent neural network language model with structured word embeddings for speech recognition.

[DOI]

,

,

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A novel static parameter calculation method for model compensation.

[DOI]

,

,

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Local trajectory based speech enhancement for robust speech recognition with deep neural network.

[DOI]

,

,

Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

An investigation on DNN-derived bottleneck features for GMM-HMM based robust speech recognition.

[DOI]

,

,

,

Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Cambridge university transcription systems for the multi-genre broadcast challenge.

[DOI]

Philip C. Woodland

,

,

,

,

Mark J. F. Gales

,

Penny Karanasou

,

Pierre Lanchantin

,

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Multi-task joint-learning of deep neural networks for robust speech recognition.

[DOI]

,

,

,

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The development of the cambridge university alignment systems for the multi-genre broadcast challenge.

[DOI]

Pierre Lanchantin

,

Mark J. F. Gales

,

Penny Karanasou

,

,

,

,

Philip C. Woodland

,

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Speaker diarisation and longitudinal linking in multi-genre broadcast data.

[DOI]

Penny Karanasou

,

Mark J. F. Gales

,

Pierre Lanchantin

,

,

,

,

Philip C. Woodland

,

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Acoustic emotion recognition using deep neural network.

[DOI]

,

,

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Tandem deep features for text-dependent speaker verification.

[DOI]

,

,

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A novel dynamic parameters calculation approach for model compensation.

[DOI]

,

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Speaker verification with deep features.

[DOI]

,

,

,

,

Proceedings of the 2014 International Joint Conference on Neural Networks, 2014

Reshaping deep neural network for fast decoding by node-pruning.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2014

Stochastic data sweeping for fast DNN training.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2014

Second order vector taylor series based robust speech recognition.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

MLP-HMM two-stage unsupervised training for low-resource languages on conversational telephone speech recognition.

[DOI]

,

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Combination of data borrowing strategies for low-resource LVCSR.

[DOI]

,

,

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Articulatory Feature based Multilingual MLPs for Low-Resource Speech Recognition.

[DOI]

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition.

[DOI]

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Generating exact lattices in the WFST framework.

[DOI]

,

Mirko Hannemann

,

Gilles Boulianne

,

,

,

,

Martin Karafiát

,

Stefan Kombrink

,

,

,

Korbinian Riedhammer

,

,

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Time-Frequency Cepstral Features and Combining Discriminative Training for Phonotactic Language Recognition.

[DOI]

,

,

,

J. Comput., 2011

Language Recognition Based on Acoustic Diversified Phone Recognizers and Phonotactic Feature Fusion.

[DOI]

,

,

,

IEICE Trans. Inf. Syst., 2011

State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs.

[DOI]

,

,

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Strategies for using MLP based features with limited target-language training data.

[DOI]

,

,

,

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Mandarin-English bilingual phone modeling and combining MPE based Discriminative training for cross-language speech recognition.

[DOI]

,

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Integration of Complementary Phone Recognizers for Phonotactic Language Recognition.

[DOI]

,

,

,

Proceedings of the Information Computing and Applications - First International Conference, 2010

Phone modeling and combining discriminative training for mandarinenglish bilingual speech recognition.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Efficient embedded speech recognition for very large vocabulary Mandarin car-navigation systems.

[DOI]

,

,

Michael T. Johnson

IEEE Trans. Consumer Electron., 2009

Loading...