We stand with Ukraine

We stand with Ukraine

Yashesh Gaur

According to our database¹, Yashesh Gaur authored at least 62 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

GSRM: Generative Speech Reward Model for Speech RLHF.

[DOI]

,

Tejas Jayashankar

,

,

,

,

Katerina Zmolíková

,

,

,

,

,

Gregory W. Wornell

,

,

CoRR, February, 2026

2025

Can Speech LLMs Think while Listening?

[DOI]

,

,

,

,

,

,

,

,

CoRR, October, 2025

Latent Speech-Text Transformer.

[DOI]

,

,

,

Benjamin Muller

,

Jesús Villalba

,

,

Luke Zettlemoyer

,

,

,

Srinivasan Iyer

,

CoRR, October, 2025

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens.

[DOI]

,

,

,

,

,

Katerina Zmolíková

,

,

,

,

Christian Fuegen

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition.

[DOI]

,

,

,

,

,

,

,

Christian Fuegen

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.

[DOI]

,

,

,

,

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Speech ReaLLM - Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time.

[DOI]

,

,

,

,

,

CoRR, 2024

Speech ReaLLM - Real-time Speech Recognition with Multimodal Language Models by Teaching the Flow of Time.

[DOI]

,

,

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning.

[DOI]

,

,

,

Sunit Sivasankaran

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning.

[DOI]

,

,

,

Sunit Sivasankaran

,

,

,

CoRR, 2023

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2023

LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CTCBERT: Advancing Hidden-Unit Bert with CTC Objectives.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments.

[DOI]

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding.

[DOI]

,

,

,

CoRR, 2022

Streaming, Fast and Accurate on-Device Inverse Text Normalization for Automatic Speech Recognition.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers.

[DOI]

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.

[DOI]

,

,

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

[DOI]

,

,

,

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

[DOI]

,

,

,

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.

[DOI]

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Dynamic Gradient Aggregation for Federated Domain Adaptation.

[DOI]

Dimitrios Dimitriadis

,

Ken'ichi Kumatani

,

,

,

Sefik Emre Eskimez

CoRR, 2021

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription.

[DOI]

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.

[DOI]

,

Sarangarajan Parthasarathy

,

,

,

,

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.

[DOI]

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations.

[DOI]

Shahram Ghorbani

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.

[DOI]

,

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speaker-Attributed ASR with Transformer.

[DOI]

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Ensemble Combination between Different Time Segmentations.

[DOI]

Jeremy Heng Meng Wong

,

Dimitrios Dimitriadis

,

Ken'ichi Kumatani

,

,

George Polovets

,

Partha Parthasarathy

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.

[DOI]

,

,

,

Sarangarajan Parthasarathy

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

[DOI]

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE International Conference on Acoustics, 2021

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings.

[DOI]

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE International Conference on Acoustics, 2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio.

[DOI]

,

,

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Federated Transfer Learning with Dynamic Gradient Aggregation.

[DOI]

Dimitrios Dimitriadis

,

Ken'ichi Kumatani

,

,

,

Sefik Emre Eskimez

CoRR, 2020

Combination of End-to-End and Hybrid Models for Speech Recognition.

[DOI]

Jeremy Heng Meng Wong

,

,

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition.

[DOI]

,

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Sequence-Level Self-Learning with Multiple Hypotheses.

[DOI]

Ken'ichi Kumatani

,

Dimitrios Dimitriadis

,

,

,

Sefik Emre Eskimez

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Serialized Output Training for End-to-End Overlapped Speech Recognition.

[DOI]

,

,

,

,

Takuya Yoshioka

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.

[DOI]

,

,

,

,

,

,

Takuya Yoshioka

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Federated Approach in Training Acoustic Models.

[DOI]

Dimitrios Dimitriadis

,

Ken'ichi Kumatani

,

,

,

Sefik Emre Eskimez

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.

[DOI]

Hirofumi Inaguma

,

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Speaker Adaptation for Attention-Based End-to-End Speech Recognition.

[DOI]

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustic-to-Phrase Models for Speech Recognition.

[DOI]

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition.

[DOI]

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Character-Aware Attention-Based End-to-End Speech Recognition.

[DOI]

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Robust Speech Recognition Using Generative Adversarial Networks.

[DOI]

,

,

,

Sanjeev Satheesh

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Reducing Bias in Production Speech Models.

[DOI]

Eric Battenberg

,

,

,

Christopher Fougner

,

,

,

,

,

,

,

,

,

Sanjeev Satheesh

,

,

,

CoRR, 2017

Exploring Neural Transducers for End-to-End Speech Recognition.

[DOI]

Eric Battenberg

,

,

,

,

,

,

,

Sanjeev Satheesh

,

,

,

CoRR, 2017

Exploring neural transducers for end-to-end speech recognition.

[DOI]

Eric Battenberg

,

,

,

,

,

,

,

Sanjeev Satheesh

,

,

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

The effects of automatic speech recognition quality on human transcription latency.

[DOI]

,

Walter S. Lasecki

,

,

Jeffrey P. Bigham

Proceedings of the 13th Web for All Conference, 2016

Manipulating Word Lattices to Incorporate Human Corrections.

[DOI]

,

,

Jeffrey P. Bigham

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Using keyword spotting to help humans correct captioning faster.

[DOI]

,

,

,

Jeffrey P. Bigham

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The Effects of Automatic Speech Recognition Quality on Human Transcription Latency.

[DOI]

Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, 2015

2013

Speaker Recognition Using Sparse Representation via Superimposed Features.

[DOI]

,

Maulik C. Madhavi

,

Hemant A. Patil

Proceedings of the Pattern Recognition and Machine Intelligence, 2013

Algorithms for speech segmentation at syllable-level for text-to-speech synthesis system in Gujarati.

[DOI]

Hemant A. Patil

,

Tanvina B. Patel

,

,

Nirmesh J. Shah

,

Hardik B. Sailor

,

Bhavik B. Vachhani

,

,

Bhargav Kanakiya

,

,

Vibha Prajapati

Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Loading...