Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.

[BibT_eX]

[DOI]

Yifan Peng

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OpusLM: A Family of Open Unified Speech Language Models.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

DYNAC: Dynamic Vocabulary-based Non-Autoregressive Contextualization for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Granary: Speech Recognition and Translation Dataset in 25 European Languages.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Exploring Linear Variant Transformers and k-NN Memory Inference for Long-Form ASR.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Context-aware Dynamic Pruning for Speech Foundation Models.

[BibT_eX]

[DOI]

Athanasios Mouchtaris

Grant P. Strimel

Jing Liu

Shinji Watanabe

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Open Full-duplex Voice Agent with Speech-to-Speech Language Model.

[BibT_eX]

[DOI]

Seelan Lakshmi Narasimhan

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Enhancing Audiovisual Speech Recognition Through Bifocal Preference Optimization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders.

[BibT_eX]

[DOI]

CoRR, 2024

MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation.

[BibT_eX]

[DOI]

CoRR, 2024

An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition.

[BibT_eX]

[DOI]

CoRR, 2024

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Contextualized Automatic Speech Recognition With Dynamic Vocabulary.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

ESPnet-EZ: Python-Only ESPnet For Easy Fine-Tuning And Integration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MULTI-CONVFORMER: Extending Conformer with Multiple Convolution Kernels.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Robust Speech Representation Learning for Thousands of Languages.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network.

[BibT_eX]

[DOI]

CoRR, 2023

CMU's IWSLT 2023 Simultaneous Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Tensor decomposition for minimization of E2E SLU model toward on-device processing.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

I3D: Transformer Architectures with Input-Dependent Dynamic Depth for Speech Recognition.

[BibT_eX]

[DOI]

Yifan Peng

Jaesong Lee

Shinji Watanabe

Proceedings of the IEEE International Conference on Acoustics, 2023

Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speechlmscore: Evaluating Speech Generation Using Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

E-Branchformer-Based E2E SLU Toward Stop on-Device Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Massively Multilingual ASR with Auxiliary CTC Objectives.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

2022

A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CMU's IWSLT 2022 Dialect Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Spoken Language Translation, 2022

Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Yifan Peng

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...