Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music.

[BibT_eX]

[DOI]

Jiatong Shi

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

A Visual Speech Language Model for Visual Text-to-Speech Task.

[BibT_eX]

[DOI]

Proceedings of the 7th ACM International Conference on Multimedia in Asia, 2025

OpusLM: A Family of Open Unified Speech Language Models.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Context-Driven Dynamic Pruning for Large Speech Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

The Text-to-speech in the Wild (TITW) Database.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Exploring Linear Variant Transformers and k-NN Memory Inference for Long-Form ASR.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Chain-of-Thought Training for Open E2E Spoken Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Preference Alignment Improves Language Model-Based TTS.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Continual Pre-training for Codec-Based Speech LLMs: Balancing Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

VERSA-v2: A Modular and Scalable Toolkit for Speech and Audio Evaluation with Expanded Metrics, Visualization, and LLM Integration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Evaluating Self-Supervised Speech Models Via Text-Based LLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

Text-To-Speech Synthesis In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs For Audio, Music, and Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

CMU's IWSLT 2024 Offline Speech Translation System: A Cascaded Approach For Long-Form Robustness.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Spoken Language Translation, 2024

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets.

[BibT_eX]

[DOI]

Vanya Bannihatti Kumar

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

AutoPrep: An Automatic Preprocessing Framework for In-The-Wild Speech Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Robust Speech Representation Learning for Thousands of Languages.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Integrating Lattice-Free MMI Into End-to-End Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec.

[BibT_eX]

[DOI]

CoRR, 2023

The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Improving Mandarin End-to-End Speech Recognition With Word N-Gram Language Model.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.

[BibT_eX]

[DOI]

CoRR, 2022

Integrate Lattice-Free MMI into End-to-End Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency.

[BibT_eX]

[DOI]

CoRR, 2021

2020

A Random Gossip BMUF Process for Neural Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Jinchuan Tian

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...