He Huang

Affiliations:
  • NVIDIA, Santa Clara, USA


According to our database1, He Huang authored at least 20 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering.
CoRR, July, 2025

Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges.
CoRR, July, 2025

Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR.
CoRR, June, 2025

Word Level Timestamp Generation for Automatic Speech Recognition and Translation.
CoRR, May, 2025

Granary: Speech Recognition and Translation Dataset in 25 European Languages.
CoRR, May, 2025

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens.
CoRR, 2024

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Bestow: Efficient and Streamable Speech Language Model with The Best of Two Worlds in GPT and T5.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System.
CoRR, 2023

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation.
CoRR, 2023

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition.
CoRR, 2023

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations.
Proceedings of the International Conference on Machine Learning, 2023

Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023


  Loading...