Tomoki Hayashi

Proceedings of the IEEE International Conference on Acoustics, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

2022

A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

Efficient Training Method for Point Cloud-Based Object Detection Models by Combining Environmental Transitions and Active Learning.

[BibT_eX]

[DOI]

Proceedings of the Robot Intelligence Technology and Applications 7, 2022

Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

An Investigation of Streaming Non-Autoregressive sequence-to-sequence Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure.

[BibT_eX]

[DOI]

Proceedings of the 30th European Signal Processing Conference, 2022

Note-level Automatic Guitar Transcription Using Attention Mechanism.

[BibT_eX]

[DOI]

Sehun Kim

Proceedings of the 30th European Signal Processing Conference, 2022

Improving Dense Representation Learning by Superpixelization and Contrasting Cluster Assignment.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.

[BibT_eX]

[DOI]

Aswin Shanmugam Subramanian

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Pretraining Techniques for Sequence-to-Sequence Voice Conversion.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem.

[BibT_eX]

[DOI]

CoRR, 2021

ViCE: Self-Supervised Visual Concept Embeddings as Contextual and Pixel Appearance Invariant Semantic Representations.

[BibT_eX]

[DOI]

CoRR, 2021

ESPnet2-TTS: Extending the Edge of TTS Research.

[BibT_eX]

[DOI]

CoRR, 2021

ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.

[BibT_eX]

[DOI]

Chenda Li

Jing Shi

Wangyou Zhang

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Acoustic Event Detection with Classifier Chains.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Any-to-One Sequence-to-Sequence Voice Conversion Using Self-Supervised Discrete Speech Representations.

[BibT_eX]

[DOI]

Chaitanya Prasad Narisetty

Proceedings of the IEEE International Conference on Acoustics, 2021

Non-Autoregressive Sequence-To-Sequence Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Anomalous Sound Detection Using a Binary Classification Model and Class Centroids.

[BibT_eX]

[DOI]

Proceedings of the 29th European Signal Processing Conference, 2021

Spontaneous Speech Summarization: Transformers All The Way Through.

[BibT_eX]

[DOI]

Proceedings of the 29th European Signal Processing Conference, 2021

Leveraging State-of-the-art ASR Techniques to Audio Captioning.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

An Ensemble Approach to Anomalous Sound Detection Based on Conformer-Based Autoencoder and Binary Classifier Incorporated with Metric Learning.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

On Prosody Modeling for ASR+TTS Based Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.

[BibT_eX]

[DOI]

Aswin Shanmugam Subramanian

Wangyou Zhang

CoRR, 2020

Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations.

[BibT_eX]

[DOI]

CoRR, 2020

DiscreTalk: Text-to-Speech as a Machine Translation Problem.

[BibT_eX]

[DOI]

Shinji Watanabe

CoRR, 2020

Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression.

[BibT_eX]

[DOI]

IEEE Access, 2020

Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Weakly-Supervised Sound Event Detection with Self-Attention.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

ESPnet-ST: All-in-One Speech Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020

2019

Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder.

[BibT_eX]

[DOI]

IEEE Access, 2019

Statistical Voice Conversion with Quasi-periodic WaveNet Vocoder.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion.

[BibT_eX]

[DOI]

Chen-Chou Lo

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Voice Conversion with Cyclic Recurrent Neural Network and Fine-tuned Wavenet Vocoder.

[BibT_eX]

[DOI]

Ramón Fernandez Astudillo

Proceedings of the IEEE International Conference on Acoustics, 2019

Scene-dependent Anomalous Acoustic-event Detection Based on Conditional Wavenet and I-vector.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Cycle-consistency Training for End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Takaaki Hori

Proceedings of the IEEE International Conference on Acoustics, 2019

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion.

[BibT_eX]

[DOI]

Hsin-Te Hwang

Proceedings of the 27th European Signal Processing Conference, 2019

Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Attention-Based Speech Recognition Using Gaze Information.

[BibT_eX]

[DOI]

Osamu Segawa

Nelson Enrique Yalta Soplin

Kazuya Takeda

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Comparative Study on Transformer vs RNN in Speech Applications.

[BibT_eX]

[DOI]

Ryuichi Yamamoto

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Daily Activity Recognition with Large-Scaled Real-Life Recording Datasets Based on Deep Neural Network Using Multi-Modal Signals.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2018

An Evaluation of Deep Spectral Mappings and WaveNet Vocoder for Voice Conversion.

[BibT_eX]

[DOI]

Ramón Fernandez Astudillo

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Back-Translation-Style Data Augmentation for end-to-end ASR.

[BibT_eX]

[DOI]

Kazuya Takeda

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

NU Voice Conversion System for the Voice Conversion Challenge 2018.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder.

[BibT_eX]

[DOI]

Nelson Enrique Yalta Soplin

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

ESPnet: End-to-End Speech Processing Toolkit.

[BibT_eX]

[DOI]

Jahn Heymann

Matthew Wiesner

Nanxin Chen

Adithya Renduchintala

Tsubasa Ochiai

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multi-Head Decoder for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Connectionist Temporal Classification-based Sound Event Encoder for Converting Sound Events into Onomatopoeic Representations.

[BibT_eX]

[DOI]

Proceedings of the 26th European Signal Processing Conference, 2018

Anomalous Sound Event Detection Based on WaveNet.

[BibT_eX]

[DOI]

Proceedings of the 26th European Signal Processing Conference, 2018

2017

Duration-Controlled LSTM for Polyphonic Sound Event Detection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Hybrid CTC/Attention Architecture for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2017

Speaker-Dependent WaveNet Vocoder.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Statistical Voice Conversion with WaveNet-Based Waveform Generation.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

An investigation of multi-speaker training for wavenet vocoder.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

An investigation of recurrent neural network for daily activity recognition using multi-modal signals.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

2015

Exploring multi-channel features for denoising-autoencoder-based speech enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Daily activity recognition based on DNN using environmental sound and acceleration signals.

[BibT_eX]

[DOI]

Proceedings of the 23rd European Signal Processing Conference, 2015

2014

Noisy speech recognition using blind spatial subtraction array technique and deep bottleneck features.

[BibT_eX]

[DOI]

Norihide Kitaoka

Kazuya Takeda

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013

Non-rigid Surface Tracking for Virtual Fitting System.

[BibT_eX]

Proceedings of the VISAPP 2013, 2013

Dream board: a visualization system by handwriting recognition.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2013, 2013

2012

Texture Overlay onto Non-rigid Surface using Commodity Depth Camera.

[BibT_eX]