Naoyuki Kanda

Orcid: 0000-0002-8628-3288

According to our database1, Naoyuki Kanda authored at least 80 papers between 2005 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like.
CoRR, 2024

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription.
CoRR, 2024

2023
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.
CoRR, 2023

Profile-Error-Tolerant Target-Speaker Voice Activity Detection.
CoRR, 2023

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability.
CoRR, 2023

DiariST: Streaming Speech Translation with Speaker Diarization.
CoRR, 2023

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
CoRR, 2023

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers.
CoRR, 2023

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.
CoRR, 2023

Factual Consistency Oriented Speech Recognition.
CoRR, 2023

Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2023

Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Separation with Large-Scale Self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

i-Code: An Integrative and Composable Multimodal Learning Framework.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
IEEE J. Sel. Top. Signal Process., 2022

A review of speaker diarization: Recent advances with deep learning.
Comput. Speech Lang., 2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts.
CoRR, 2022

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition.
CoRR, 2022

Speech separation with large-scale self-supervised learning.
CoRR, 2022

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization.
CoRR, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.
Proceedings of the Interspeech 2022, 2022

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.
Proceedings of the Interspeech 2022, 2022

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Proceedings of the Interspeech 2022, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Proceedings of the Interspeech 2022, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Proceedings of the Interspeech 2022, 2022

All-Neural Beamformer for Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

VarArray: Array-Geometry-Agnostic Continuous Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Streaming End-to-End Multi-Talker Speech Recognition.
IEEE Signal Process. Lett., 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
CoRR, 2021

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

End-to-End Speaker-Attributed ASR with Transformer.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech-Language Pre-Training for End-to-End Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2021

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Serialized Output Training for End-to-End Overlapped Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.
Proceedings of the Interspeech 2020, 2020

2019
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition.
Proceedings of the Interspeech 2019, 2019

Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR.
Proceedings of the Interspeech 2019, 2019

Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation.
Proceedings of the Interspeech 2019, 2019

End-to-End Neural Speaker Diarization with Permutation-Free Objectives.
Proceedings of the Interspeech 2019, 2019

Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches.
Proceedings of the IEEE International Conference on Acoustics, 2019

Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

End-to-End Neural Speaker Diarization with Self-Attention.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Face-Voice Matching using Cross-modal Embeddings.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models.
Proceedings of the Interspeech 2018, 2018

Sequence Distillation for Purely Sequence Trained Acoustic Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription.
Speech Commun., 2016

Maximum a posteriori Based Decoding for CTC Acoustic Models.
Proceedings of the Interspeech 2016, 2016

Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks.
Proceedings of the Interspeech 2016, 2016

2015
Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Open-ended Spoken Language Technology: Studies on Spoken Dialogue Systems and Spoken Document Retrieval Systems.
PhD thesis, 2014

The NCT ASR system for IWSLT 2014.
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2014, 2014

Boundary contraction training for acoustic models based on discrete deep neural networks.
Proceedings of the INTERSPEECH 2014, 2014

2013
Noise robust speaker verification with delta cepstrum normalization.
Proceedings of the INTERSPEECH 2013, 2013

Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier.
Proceedings of the IEEE International Conference on Acoustics, 2013

Elastic spectral distortion for low resource speech recognition with deep neural networks.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
Using rhythmic features for Japanese spoken term detection.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Voice activity detection based on augmented statistical noise suppression.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
A multi-expert model for dialogue and behavior control of conversational robots and agents.
Knowl. Based Syst., 2011

2008
Open-vocabulary keyword detection from super-large scale speech database.
Proceedings of the International Workshop on Multimedia Signal Processing, 2008

2006
Multi-Domain Spoken Dialogue System with Extensibility and Robustness against Speech Recognition Errors.
Proceedings of the SIGDIAL 2006 Workshop, 2006

2005
A two-layer model for behavior and dialogue planning in conversational service robots.
Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2005

Contextual constraints based on dialogue models in database search task for spoken dialogue systems.
Proceedings of the INTERSPEECH 2005, 2005


  Loading...