Hirofumi Inaguma

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Seamless: Multilingual Expressive and Streaming Speech Translation.

[BibT_eX]

[DOI]

Loïc Barrault

Yu-An Chung

Mariano Coria Meglioli

David Dale

Ning Dong

Mark Duppenthaler

Paul-Ambroise Duquenne

Kaushik Ram Sadagopan

Gabriel Mejia Gonzalez

CoRR, 2023

Efficient Monotonic Multihead Attention.

[BibT_eX]

[DOI]

CoRR, 2023

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation.

[BibT_eX]

[DOI]

Seamless Communication

Loïc Barrault

Yu-An Chung

Mariano Coria Meglioli

David Dale

Ning Dong

Paul-Ambroise Duquenne

Kaushik Ram Sadagopan

Gabriel Mejia Gonzalez

CoRR, 2023

Exploration on HuBERT with Multiple Resolutions.

[BibT_eX]

[DOI]

CoRR, 2023

Findings of the IWSLT 2023 Evaluation Campaign.

[BibT_eX]

[DOI]

Sweta Agrawal

Antonios Anastasopoulos

Alexandra Chronopoulou

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Exploration on HuBERT with Multiple Resolution.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Enhancing Speech-To-Speech Translation with Multiple TTS Targets.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Named Entity Detection and Injection for Direct Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

Simple and Effective Unsupervised Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Speech-to-Speech Translation for a Real-world Unwritten Language.

[BibT_eX]

[DOI]

Paul-Ambroise Duquenne

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Distilling the Knowledge of BERT for CTC-based ASR.

[BibT_eX]

[DOI]

CoRR, 2022

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Fast and Low-Latency End-to-End Speech Recognition and Translation.

[BibT_eX]

[DOI]

PhD thesis, 2021

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring.

[BibT_eX]

[DOI]

CoRR, 2021

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Spoken Language Translation, 2021

VAD-Free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR.

[BibT_eX]

[DOI]

Aswin Shanmugam Subramanian

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Improved Mask-CTC for Non-Autoregressive End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

ASR Rescoring and Confidence Estimation with Electra.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.

[BibT_eX]

[DOI]

Wangyou Zhang

CoRR, 2020

Enhancing Monotonic Multihead Attention for Streaming ASR.

[BibT_eX]

[DOI]

Masato Mimura

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

CTC-Synchronous Training for Monotonic Attention Model.

[BibT_eX]

[DOI]

Masato Mimura

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Speech-to-Dialog-Act Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

ESPnet-ST: All-in-One Speech Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020

2019

ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper.

[BibT_eX]

[DOI]

Nelson Enrique Yalta Soplin

Shun Kiyono

Jun Suzuki

Kevin Duh

Proceedings of the 16th International Conference on Spoken Language Translation, 2019

Transfer Learning of Language-independent End-to-end ASR with Language Model Fusion.

[BibT_eX]

[DOI]

Jaejin Cho

Murali Karthick Baskar

Proceedings of the IEEE International Conference on Acoustics, 2019

Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition.

[BibT_eX]

[DOI]

Jaejin Cho

Takaaki Hori

Murali Karthick Baskar

Nelson Enrique Yalta Soplin

Jesús Villalba

Najim Dehak

Proceedings of the IEEE International Conference on Acoustics, 2019

A Comparative Study on Transformer vs RNN in Speech Applications.

[BibT_eX]

[DOI]

Ryuichi Yamamoto

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Multilingual End-to-End Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Leveraging Sequence-to-Sequence Speech Synthesis for Enhancing Acoustic-to-Word Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

The JHU/KyotoU Speech Translation System for IWSLT 2018.

[BibT_eX]

[DOI]

Xuan Zhang

Zhiqi Wang

Adithya Renduchintala