We stand with Ukraine

We stand with Ukraine

Ron Hoory

Orcid: 0009-0006-1327-5160

According to our database¹, Ron Hoory authored at least 54 papers between 1994 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark.

[DOI]

,

,

Hagai Aronowitz

,

,

CoRR, April, 2026

2025

Advancing Speech Understanding in Speech-Aware Language Models with GRPO.

[DOI]

Avishai Elmakies

,

Hagai Aronowitz

,

,

,

,

CoRR, September, 2025

Spoken Question Answering for Visual Queries.

[DOI]

,

,

,

Hagai Aronowitz

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Exploring the Limits of Conformer CTC-Encoder for Speech Emotion Recognition using Large Language Models.

[DOI]

Edmilson da Silva Morais

,

Hagai Aronowitz

,

,

,

,

Brian Kingsbury

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Speech Synthesis From Continuous Features Using Per-Token Latent Diffusion.

[DOI]

,

,

,

Slava Shechtman

,

,

Hagai Aronowitz

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities.

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Creating an African American-Sounding TTS: Guidelines, Technical Challenges, and Surprising Evaluations.

[DOI]

Claudio Santos Pinhanez

,

,

Marcelo Carpinette Grave

,

,

Proceedings of the 29th International Conference on Intelligent User Interfaces, 2024

Speak While You Think: Streaming Speech Synthesis During Text Generation.

[DOI]

,

Slava Shechtman

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Modeling Turn-Taking in Human-To-Human Spoken Dialogue Datasets Using Self-Supervised Features.

[DOI]

Edmilson da Silva Morais

,

Matheus Damasceno

,

Hagai Aronowitz

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Speech Emotion Recognition Using Self-Supervised Features.

[DOI]

Edmilson da Silva Morais

,

,

,

,

Matheus Damasceno

,

Hagai Aronowitz

Proceedings of the IEEE International Conference on Acoustics, 2022

A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets.

[DOI]

,

,

,

,

,

,

Brian Kingsbury

Proceedings of the IEEE International Conference on Acoustics, 2022

Speaker Normalization for Self-Supervised Speech Emotion Recognition.

[DOI]

,

Hagai Aronowitz

,

,

Edmilson da Silva Morais

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Towards A Common Speech Analysis Engine.

[DOI]

Hagai Aronowitz

,

,

Edmilson da Silva Morais

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

An autonomous debating system.

[DOI]

,

,

,

,

,

Francesca Bonin

,

,

Edo Cohen-Karlik

,

,

Lilach Edelstein

,

,

Roni Friedman-Melamed

,

,

,

,

,

,

,

Daniel Hershcovich

,

,

,

,

,

,

,

,

David Konopnicki

,

,

,

,

,

,

,

Naftali Liberman

,

,

,

,

,

Shila Ofek-Koifman

,

,

Ella Rabinovich

,

,

Slava Shechtman

,

Dafna Sheinwald

,

,

Ilya Shnayderman

,

,

,

Benjamin Sznajder

,

,

Orith Toledo-Ronen

,

,

Nat., 2021

RNN Transducer Models for Spoken Language Understanding.

[DOI]

,

Hong-Kwang Jeff Kuo

,

,

,

Brian Kingsbury

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Principal Style Components: Expressive Style Control and Cross-Speaker Transfer in Neural TTS.

[DOI]

Alexander Sorin

,

Slava Shechtman

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition.

[DOI]

,

Hagai Aronowitz

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Spoken Language Understanding Without Full Transcripts.

[DOI]

Hong-Kwang Jeff Kuo

,

,

,

,

Kartik Audhkhasi

,

Brian Kingsbury

,

,

,

,

Luis A. Lastras

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

New Advances in Speaker Diarization.

[DOI]

Hagai Aronowitz

,

,

Masayuki Suzuki

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems.

[DOI]

,

,

,

,

Kartik Audhkhasi

,

Brian Kingsbury

,

,

Michael Picheny

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

High Quality, Lightweight and Adaptable TTS Using LPCNet.

[DOI]

,

Slava Shechtman

,

Alexander Sorin

,

Carmel Rabinovitz

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Neural TTS Voice Conversion.

[DOI]

,

Slava Shechtman

,

Alexander Sorin

,

,

Carmel Rabinovitz

,

Edmilson Da Silva Morais

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

The IBM Virtual Voice Creator.

[DOI]

Alexander Sorin

,

Slava Shechtman

,

,

,

,

,

,

Carmel Rabinovitz

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Word Emphasis Prediction for Expressive Text to Speech.

[DOI]

,

Slava Shechtman

,

Moran Mordechay

,

,

Oren Sar Shalom

,

,

David Konopnicki

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms.

[DOI]

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels.

[DOI]

,

,

,

Andrew Rosenberg

,

,

Bhuvana Ramabhadran

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Voice-transformation-based data augmentation for prosodic classification.

[DOI]

,

Andrew Rosenberg

,

Alexander Sorin

,

Bhuvana Ramabhadran

,

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Using continuous lexical embeddings to improve symbolic-prosody prediction in a text-to-speech front-end.

[DOI]

,

,

,

Bhuvana Ramabhadran

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Using deep bidirectional recurrent neural networks for prosodic-target prediction in a unit-selection text-to-speech system.

[DOI]

,

,

Bhuvana Ramabhadran

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Fusion of voice signal information for detection of mild laryngeal pathology.

[DOI]

Evaldas Vaiciukynas

,

Antanas Verikas

,

,

Marija Bacauskiene

,

,

,

Appl. Soft Comput., 2014

Speech-based automatic and robust detection of very early dementia.

[DOI]

,

,

Alexandra König

,

,

Philippe H. Robert

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks.

[DOI]

,

,

Bhuvana Ramabhadran

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Exploring modulation spectrum features for speech-based depression level classification.

[DOI]

,

Orith Toledo-Ronen

,

Alexander Sorin

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Multi-modal biometrics for mobile authentication.

[DOI]

Hagai Aronowitz

,

,

Orith Toledo-Ronen

,

,

,

,

,

,

Nalini K. Ratha

,

Sharath Pankanti

,

Proceedings of the IEEE International Joint Conference on Biometrics, Clearwater, 2014

2013

F0 contour prediction with a deep belief network-Gaussian process hybrid model.

[DOI]

,

,

Bhuvana Ramabhadran

,

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Towards automatic phonetic segmentation for TTS.

[DOI]

,

Alexander Sorin

,

,

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Towards Goat Detection in Text-Dependent Speaker Verification.

[DOI]

Orith Toledo-Ronen

,

Hagai Aronowitz

,

,

Jason W. Pelecanos

,

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Improved Spoken Query Transcription Using Co-Occurrence Information.

[DOI]

,

,

Bhuvana Ramabhadran

,

,

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

New Developments in Voice Biometrics for User Authentication.

[DOI]

Hagai Aronowitz

,

,

Jason W. Pelecanos

,

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speech processing and retrieval in a personal memory aid system for the elderly.

[DOI]

Alexander Sorin

,

Hagai Aronowitz

,

,

Orith Toledo-Ronen

,

,

Michael Kuritzky

,

,

Bhuvana Ramabhadran

,

Proceedings of the IEEE International Conference on Acoustics, 2011

2008

The IBM Submission to the 2008 Text-to-Speech Blizzard Challenge.

[DOI]

,

,

Slava Shechtman

,

,

,

Bhuvana Ramabhadran

,

Proceedings of the Blizzard Challenge 2008, 2008

2006

Spoken document retrieval from call-center conversations.

[DOI]

,

,

Proceedings of the SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006

High Quality Sinusoidal Modeling of Wideband Speech for the Purposes of Speech Synthesis and Modification.

[DOI]

,

,

,

Slava Shechtman

,

Alexander Sorin

,

,

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

The IBM Submission to the 2006 Blizzard Text-to-Speech Challenge.

[DOI]

,

,

,

,

,

Michael Picheny

,

,

Slava Shechtman

,

Proceedings of the Blizzard Challenge 2006, Pittsburgh, PA, USA, September 16, 2006, 2006

2005

Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling.

[DOI]

,

,

,

,

Slava Shechtman

,

Alexander Sorin

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Automatic analysis of call-center conversations.

[DOI]

,

,

,

,

Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31, 2005

2004

The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation.

[DOI]

Alexander Sorin

,

Tenkasi Ramabadran

,

,

,

Michael J. McLaughlin

,

,

,

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

The ETSI extended distributed speech recognition (DSR) standards: server-side speech reconstruction.

[DOI]

Tenkasi Ramabadran

,

Alexander Sorin

,

Michael J. McLaughlin

,

,

,

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2002

Reducing the footprint of the IBM trainable speech synthesis system.

[DOI]

,

,

,

Dorel Silberstein

,

Alexander Sorin

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001

Efficient periodicity extraction based on sine-wave representation and its application to pitch determination of speech signals.

[DOI]

,

,

,

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000

Conversational networking: conversational protocols for transport, coding, and control.

[DOI]

Stéphane H. Maes

,

,

,

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Speech reconstruction from mel frequency cepstral coefficients and pitch frequency.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2000

Low bit rate speech compression for playback in speech recognition systems.

[DOI]

,

,

,

Proceedings of the 10th European Signal Processing Conference, 2000

1994

Speech synthesis for a specific speaker based on a labeled speech database.

[DOI]

,

Proceedings of the 12th IAPR International Conference on Pattern Recognition, 1994

Loading...