We stand with Ukraine

We stand with Ukraine

Hagen Soltau

According to our database¹, Hagen Soltau authored at least 74 papers between 1996 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Learning Visual Composition through Improved Semantic Guidance.

[DOI]

,

,

,

,

,

,

,

,

Jonathon Shlens

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Knowledge Graph Reasoning with Self-supervised Reinforcement Learning.

[DOI]

,

,

,

,

,

Laurent El Shafey

,

,

,

CoRR, 2024

Retrieval Augmented End-to-End Spoken Dialog Models.

[DOI]

,

,

,

,

,

,

Laurent El Shafey

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

SLM: Bridge the thin gap between speech and text foundation models.

[DOI]

,

,

,

,

Chung-Cheng Chiu

,

,

,

,

,

,

Paul K. Rubenstein

,

,

,

,

,

Nikhil Siddhartha

,

Johan Schalkwyk

,

CoRR, 2023

Efficient Adapters for Giant Speech Models.

[DOI]

,

,

,

Chung-Cheng Chiu

,

,

,

CoRR, 2023

Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding.

[DOI]

,

,

,

,

,

,

Laurent El Shafey

CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.

[DOI]

CoRR, 2023

Speech Aware Dialog System Technology Challenge (DSTC11).

[DOI]

,

,

,

Abhinav Rastogi

,

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AnyTOD: A Programmable Task-Oriented Dialog System.

[DOI]

,

,

,

,

Abhinav Rastogi

,

,

,

,

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SLM: Bridge the Thin Gap Between Speech and Text Foundation Models.

[DOI]

,

,

,

,

Chung-Cheng Chiu

,

,

,

,

,

Paul K. Rubenstein

,

,

,

,

Nikhil Siddhartha

,

Johan Schalkwyk

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Detecting Speech Abnormalities With a Perceiver-Based Sequence Classifier that Leverages a Universal Speech Model.

[DOI]

,

,

,

Joseph R. Duffy

,

Rene L. Utianski

,

Leland R. Barnard

,

John L. Stricker

,

Daniela A. Wiepert

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

RNN Transducers for Nested Named Entity Recognition with constraints on alignment for long sequences.

[DOI]

,

,

,

Laurent El Shafey

CoRR, 2022

Unsupervised Slot Schema Induction for Task-oriented Dialog.

[DOI]

,

,

,

,

Laurent El Shafey

,

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

RNN Transducers for Named Entity Recognition with constraints on alignment for understanding medical conversations.

[DOI]

,

,

,

Laurent El Shafey

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Knowledge-grounded Dialog State Tracking.

[DOI]

,

,

,

Laurent El Shafey

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021

Understanding Medical Conversations: Rich Transcription, Confidence Scores & Information Extraction.

[DOI]

,

,

,

Laurent El Shafey

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Word-Level Confidence Estimation for RNN Transducers.

[DOI]

,

,

Laurent El Shafey

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

The Medical Scribe: Corpus Development and Model Performance Analyses.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Laurent El Shafey

,

,

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

2019

Joint Speech Recognition and Speaker Diarization via Sequence Transduction.

[DOI]

Laurent El Shafey

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Monotonic Recurrent Neural Network Transducer and Decoding Strategies.

[DOI]

Anshuman Tripathi

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2017

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition.

[DOI]

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Reducing the computational complexity for whole word models.

[DOI]

,

,

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2015

Deep Convolutional Neural Networks for Large-scale Speech Tasks.

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

,

,

Abdel-rahman Mohamed

,

,

Bhuvana Ramabhadran

Neural Networks, 2015

2014

Automatic Speech Recognition.

[DOI]

,

,

,

,

Brian Kingsbury

,

,

Proceedings of the Natural Language Processing of Semitic Languages, 2014

Unfolded recurrent neural networks for speech recognition.

[DOI]

,

,

,

Michael Picheny

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Removing redundancy from lattices.

[DOI]

,

,

,

Pegah Ghahremani

,

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions.

[DOI]

,

Sriram Ganapathy

,

,

Proceedings of the IEEE International Conference on Acoustics, 2014

Joint training of convolutional and non-convolutional neural networks.

[DOI]

,

,

Tara N. Sainath

Proceedings of the IEEE International Conference on Acoustics, 2014

A comparison of two optimization techniques for sequence discriminative training of deep neural networks.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2014

Progress in dynamic network decoding.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2014

Efficient spoken term detection using confusion networks.

[DOI]

,

Brian Kingsbury

,

,

,

Michael Picheny

Proceedings of the IEEE International Conference on Acoustics, 2014

Out-of-vocabulary word detection in a speech-to-speech translation system.

[DOI]

,

Ellen Eide Kislal

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks.

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

,

Bhuvana Ramabhadran

IEEE Trans. Speech Audio Process., 2013

Neural network acoustic models for the DARPA RATS program.

[DOI]

,

,

,

,

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

The IBM speech activity detection system for the DARPA RATS program.

[DOI]

,

,

,

Sriram Ganapathy

,

Brian Kingsbury

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Morpheme-based feature-rich language models using Deep Neural Networks for LVCSR of Egyptian Arabic.

[DOI]

Amr El-Desoky Mousa

,

Hong-Kwang Jeff Kuo

,

,

Proceedings of the IEEE International Conference on Acoustics, 2013

Exploiting diversity for spoken term detection.

[DOI]

,

,

,

Brian Kingsbury

,

Proceedings of the IEEE International Conference on Acoustics, 2013

Speaker adaptation of neural network acoustic models using i-vectors.

[DOI]

,

,

,

Michael Picheny

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Improvements to Deep Convolutional Neural Networks for LVCSR.

[DOI]

Tara N. Sainath

,

Brian Kingsbury

,

Abdel-rahman Mohamed

,

,

,

,

,

Aleksandr Y. Aravkin

,

Bhuvana Ramabhadran

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

The IBM keyword search system for the DARPA RATS program.

[DOI]

,

,

,

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Boosting systems for large vocabulary continuous speech recognition.

[DOI]

,

Speech Commun., 2012

Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization.

[DOI]

Brian Kingsbury

,

Tara N. Sainath

,

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

The IBM 2009 GALE Arabic speech transcription system.

[DOI]

Brian Kingsbury

,

,

,

,

,

,

Suman V. Ravuri

,

,

Proceedings of the IEEE International Conference on Acoustics, 2011

From Modern Standard Arabic to Levantine ASR: Leveraging GALE for dialects.

[DOI]

,

,

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

The IBM 2011 GALE Arabic speech transcription system.

[DOI]

,

,

,

Brian Kingsbury

,

,

,

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

The IBM Attila speech recognition toolkit.

[DOI]

,

,

Brian Kingsbury

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Discriminative Phonotactics for Dialect Recognition Using Context-Dependent Phone Classifiers.

[DOI]

,

,

,

Jirí Navrátil

,

Julia Hirschberg

Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Boosting systems for LVCSR.

[DOI]

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Decoding with shrinkage-based language models.

[DOI]

,

Stanley F. Chen

,

Abraham Ittycheriah

,

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The IBM 2008 GALE Arabic speech transcription system.

[DOI]

,

,

Upendra V. Chaudhari

,

,

Brian Kingsbury

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2010

A comparative study on system combination schemes for LVCSR.

[DOI]

,

Hong-Kwang Jeff Kuo

,

,

,

Upendra V. Chaudhari

,

,

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program.

[DOI]

,

,

Brian Kingsbury

,

Hong-Kwang Jeff Kuo

,

,

,

IEEE Trans. Speech Audio Process., 2009

Large margin semi-tied covariance transforms for discriminative training.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2009

Dynamic network decoding revisited.

[DOI]

,

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

Fast speaker adaptive training for speech recognition.

[DOI]

,

Hong-Kwang Jeff Kuo

,

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007

The IBM 2006 Gale Arabic ASR System.

[DOI]

,

,

Brian Kingsbury

,

Hong-Kwang Jeff Kuo

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Advances in speech transcription at IBM under the DARPA EARS program.

[DOI]

Stanley F. Chen

,

Brian Kingsbury

,

,

,

,

,

IEEE Trans. Speech Audio Process., 2006

2005

Compensating hyperarticulation for automatic speech recognition.

[DOI]

PhD thesis, 2005

The IBM 2004 Conversational Telephony System for Rich Transcription.

[DOI]

,

Brian Kingsbury

,

,

,

,

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

fMPE: Discriminatively Trained Features for Speech Recognition.

[DOI]

,

Brian Kingsbury

,

,

,

,

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

The 2003 ISL rich transcription system for conversational telephony speech.

[DOI]

,

,

,

Christian Fügen

,

,

Szu-Chen Stan Jou

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2002

Compensating for hyperarticulation by modeling articulatory properties.

[DOI]

,

,

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Efficient language model lookahead through polymorphic linguistic context assignment.

[DOI]

,

,

Christian Fügen

,

Proceedings of the IEEE International Conference on Acoustics, 2002

2001

Advances in meeting recognition.

[DOI]

,

,

,

,

,

Martin Westphal

,

,

,

Proceedings of the First International Conference on Human Language Technology Research, 2001

Speech recognition over netmeeting connections.

[DOI]

,

John W. McDonough

,

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Advances in automatic meeting record creation and access.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2001

The ISL evaluation system for Verbmobil-II.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2001

Speaker compensation with sine-log all-pass transforms.

[DOI]

John W. McDonough

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2001

2000

Phone dependent modeling of hyperarticulated effects#.

[DOI]

,

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Specialized acoustic models for hyperarticulated speech.

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2000

Confidence measure based language identification.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2000

1998

On the influence of hyperarticulated speech on recognition performance.

[DOI]

,

Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Recognition of music types.

[DOI]

,

,

Martin Westphal

,

Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1996

Automatische Identifizierung spontan gesprochener Sprachen mit neuronalen Netzen.

,

Proceedings of the Natural Language Processing and Speech Technology, 1996

Loading...