We stand with Ukraine

We stand with Ukraine

He Huang

Affiliations:

NVIDIA, Santa Clara, USA

According to our database¹, He Huang authored at least 21 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2026

Recent trends in distant conversational speech recognition: A review of CHiME-7 and 8 DASR challenges.

[BibT_eX]

[DOI]

Samuele Cornell

,

Christoph Boeddeker

,

,

,

,

Matthew Wiesner

,

Yoshiki Masuyama

,

,

,

Stefano Squartini

,

,

Shinji Watanabe

Comput. Speech Lang., 2026

2025

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.

[BibT_eX]

[DOI]

,

Krishna C. Puvvada

,

,

,

,

,

,

Shinji Watanabe

,

Jagadeesh Balam

,

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR.

[BibT_eX]

[DOI]

,

,

Ivan Medennikov

,

,

,

,

Nithin Rao Koluguri

,

Jagadeesh Balam

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering.

[BibT_eX]

[DOI]

Ivan Medennikov

,

,

,

,

,

,

Jagadeesh Balam

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Granary: Speech Recognition and Translation Dataset in 25 European Languages.

[BibT_eX]

[DOI]

Nithin Rao Koluguri

,

,

George Zelenfroynd

,

,

,

Sofia Kostandian

,

,

,

Jagadeesh Balam

,

Vitaly Lavrukhin

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Word Level Timestamp Generation for Automatic Speech Recognition and Translation.

[BibT_eX]

[DOI]

,

Krishna C. Puvvada

,

Elena Rastorgueva

,

,

,

,

,

,

Jagadeesh Balam

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems.

[BibT_eX]

[DOI]

,

Ivan Medennikov

,

,

,

,

Nithin Rao Koluguri

,

Krishna C. Puvvada

,

Jagadeesh Balam

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR.

[BibT_eX]

[DOI]

,

,

,

,

,

Ivan Medennikov

,

,

Nithin Rao Koluguri

,

Jagadeesh Balam

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks.

[BibT_eX]

[DOI]

,

,

,

Ivan Medennikov

,

Krishna C. Puvvada

,

Nithin Rao Koluguri

,

,

Jagadeesh Balam

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens.

[BibT_eX]

[DOI]

,

Ivan Medennikov

,

,

,

,

Nithin Rao Koluguri

,

Krishna C. Puvvada

,

Jagadeesh Balam

,

CoRR, 2024

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR.

[BibT_eX]

[DOI]

,

,

,

Krishna C. Puvvada

,

Ivan Medennikov

,

Somshubra Majumdar

,

,

Jagadeesh Balam

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Bestow: Efficient and Streamable Speech Language Model with The Best of Two Worlds in GPT and T5.

[BibT_eX]

[DOI]

,

,

Oleksii Hrinchuk

,

Krishna C. Puvvada

,

Nithin Rao Koluguri

,

,

Jagadeesh Balam

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.

[BibT_eX]

[DOI]

Krishna C. Puvvada

,

,

,

Oleksii Hrinchuk

,

Nithin Rao Koluguri

,

,

Somshubra Majumdar

,

Elena Rastorgueva

,

,

Vitaly Lavrukhin

,

Jagadeesh Balam

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment.

[BibT_eX]

[DOI]

,

,

,

,

,

Yu-Chiang Frank Wang

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation.

[BibT_eX]

[DOI]

,

,

Andrei Andrusenko

,

Oleksii Hrinchuk

,

Krishna C. Puvvada

,

,

Subhankar Ghosh

,

Jagadeesh Balam

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System.

[BibT_eX]

[DOI]

,

,

,

,

Krishna C. Puvvada

,

Nithin Rao Koluguri

,

,

Aleksandr Laptev

,

Jagadeesh Balam

,

CoRR, 2023

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation.

[BibT_eX]

[DOI]

,

,

,

Nithin Rao Koluguri

,

,

,

Jagadeesh Balam

,

CoRR, 2023

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition.

[BibT_eX]

[DOI]

,

,

Somshubra Majumdar

,

,

,

Oleksii Hrinchuk

,

,

CoRR, 2023

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling.

[BibT_eX]

[DOI]

,

Jagadeesh Balam

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations.

[BibT_eX]

[DOI]

,

,

Somshubra Majumdar

,

,

Shinji Watanabe

,

Proceedings of the International Conference on Machine Learning, 2023

Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition.

[BibT_eX]

[DOI]

,

Nithin Rao Koluguri

,

,

Somshubra Majumdar

,

,

,

Oleksii Hrinchuk

,

Krishna C. Puvvada

,

,

Jagadeesh Balam

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Loading...