Yossi Adi

CoRR, May, 2025

Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Scaling Analysis of Interleaved Speech-Text Language Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

Unsupervised Speech Segmentation: A General Approach Using Speech Language Models.

[BibT_eX]

[DOI]

Avishai Elmakies

Omri Abend

CoRR, January, 2025

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation.

[BibT_eX]

[DOI]

Felix Kreuk

Trans. Mach. Learn. Res., 2025

Discrete Audio Tokens: More Than a Survey!

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

On The Landscape of Spoken Language Models: A Comprehensive Survey.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

WhiStress: Enriching Transcriptions with Sentence Stress Detection.

[BibT_eX]

[DOI]

Iddo Yosha

Dorin Shteyman

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

PAST: Phonetic-Acoustic Speech Tokenizer.

[BibT_eX]

[DOI]

Nadav Har-Tuv

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

CAFA: A Controllable Automatic Foley Artist.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Enhancing TTS Stability in Hebrew using Discrete Semantic Units.

[BibT_eX]

[DOI]

Ella Zeldes

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Latent Watermarking of Audio Generative Models.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Salmon: A Suite for Acoustic Language Model Evaluation.

[BibT_eX]

[DOI]

Amit Roth

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

GmSLM : Generative Marmoset Spoken Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Speech Synthesis From Continuous Features Using Per-Token Latent Diffusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Slamming: Training a Speech Language Model on One GPU in a Day.

[BibT_eX]

[DOI]

Avishai Elmakies

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Scaling Speech Technology to 1, 000+ Languages.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2024

Formal Language Knowledge Corpus for Retrieval Augmented Generation.

[BibT_eX]

[DOI]

Majd Zayyad

CoRR, 2024

A Suite for Acoustic Language Model Evaluation.

[BibT_eX]

[DOI]

Amit Roth

CoRR, 2024

LAST: Language Model Aware Speech Tokenization.

[BibT_eX]

[DOI]

Arnon Turetzky

CoRR, 2024

Improving Visual Commonsense in Language Models via Multiple Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation.

[BibT_eX]

[DOI]

CoRR, 2024

Transformers are Multi-State RNNs.

[BibT_eX]

[DOI]

CoRR, 2024

Discrete Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

Audio Conditioning for Music Generation via Discrete Bottleneck Features.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Language Modeling Approach to Diacritic-Free Hebrew TTS.

[BibT_eX]

[DOI]

Amit Roth

Arnon Turetzky

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

NAST: Noise Aware Speech Tokenization for Speech Language Models.

[BibT_eX]

[DOI]

Shoval Messica

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline.

[BibT_eX]

[DOI]

Shiran Aziz

Shmuel Peleg

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

An Independence-promoting Loss for Music Generation with Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Masked Audio Generation using a Single Non-Autoregressive Transformer.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Transformers are Multi-State RNNs.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Layer Collaboration in the Forward-Forward Algorithm.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

High Fidelity Neural Audio Compression.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Generative Spoken Dialogue Language Modeling.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2023

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS.

[BibT_eX]

[DOI]

CoRR, 2023

Code Llama: Open Foundation Models for Code.

[BibT_eX]

[DOI]

CoRR, 2023

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Textually Pretrained Speech Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Simple and Controllable Music Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AudioGen: Textually Guided Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Analysing Discrete Self Supervised Speech Representation For Spoken Language Modeling.

[BibT_eX]

[DOI]

Amitay Sicherman

Proceedings of the IEEE International Conference on Acoustics, 2023

I Hear Your True Colors: Image Guided Audio Generation.

[BibT_eX]

[DOI]

Roy Sheffer

Proceedings of the IEEE International Conference on Acoustics, 2023

AERO: Audio Super Resolution in the Spectral Domain.

[BibT_eX]

[DOI]

Moshe Mandel

Proceedings of the IEEE International Conference on Acoustics, 2023

A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Generative Spoken Language Model based on continuous word-sized audio tokens.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Differentiable Model Compression via Pseudo Quantization Noise.

[BibT_eX]

[DOI]

Alexandre Défossez

Gabriel Synnaeve

Trans. Mach. Learn. Res., 2022

RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2022

Speaking Style Conversion With Discrete Self-Supervised Units.

[BibT_eX]

[DOI]

CoRR, 2022

Audio Language Modeling using Perceptually-Guided Discrete Representations.

[BibT_eX]

[DOI]

CoRR, 2022

On The Robustness of Self-Supervised Representations for Spoken Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2022

textless-lib: a Library for Textless Spoken Language Processing.

[BibT_eX]

[DOI]

CoRR, 2022

Stop: A Dataset for Spoken Task Oriented Semantic Parsing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

On the Importance of Gradient Norm in PAC-Bayesian Bounds.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Textless Speech-to-Speech Translation on Real Data.

[BibT_eX]

[DOI]

Ann Lee

Hongyu Gong

Paul-Ambroise Duquenne

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Deep Audio Waveform Prior.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Probing phoneme, language and speaker information in unsupervised speech representations.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors.

[BibT_eX]

[DOI]

Shahaf Bassan

Jeffrey S. Rosenschein

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Continual Self-Training With Bootstrapped Remixing For Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Textless Speech Emotion Conversion using Discrete & Decomposed Representations.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Direct Speech-to-Speech Translation With Discrete Units.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Text-Free Prosody-Aware Generative Spoken Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

SAGRNN: Self-Attentive Gated RNN For Binaural Speaker Separation With Interaural Cue Preservation.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

Textless Speech Emotion Conversion using Decomposed and Discrete Representations.

[BibT_eX]

[DOI]

CoRR, 2021

Direct speech-to-speech translation with discrete units.

[BibT_eX]

[DOI]

CoRR, 2021

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation.

[BibT_eX]

[DOI]

CoRR, 2021

Generative Spoken Language Modeling from Raw Audio.

[BibT_eX]

[DOI]

CoRR, 2021

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

High Fidelity Speech Regeneration with Application to Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Single Channel Voice Separation for Unknown Number of Speakers Under Reverberant and Noisy Settings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

fairseq S\^2: A Scalable and Integrable Speech Synthesis Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2021

Fairness in the Eyes of the Data: Certifying Machine-Learning Models.

[BibT_eX]

[DOI]

Proceedings of the AIES '21: AAAI/ACM Conference on AI, 2021

2020

On the generalization of bayesian deep nets for multi-class classification.

[BibT_eX]

[DOI]

CoRR, 2020

Minimal Modifications of Deep Neural Networks using Verification.

[BibT_eX]

[DOI]

Proceedings of the LPAR 2020: 23rd International Conference on Logic for Programming, 2020

Unsupervised Cross-Domain Singing Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation.

[BibT_eX]

[DOI]

Felix Kreuk

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Hide and Speak: Towards Deep Neural Networks for Speech Steganography.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Real Time Speech Enhancement in the Waveform Domain.

[BibT_eX]

[DOI]

Alexandre Défossez

Gabriel Synnaeve

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Voice Separation with an Unknown Number of Multiple Speakers.

[BibT_eX]

[DOI]

Eliya Nachmani

Lior Wolf

Proceedings of the 37th International Conference on Machine Learning, 2020

Phoneme Boundary Detection Using Learnable Segmental Features.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Hide and Speak: Deep Neural Networks for Speech Steganography.

[BibT_eX]

[DOI]

CoRR, 2019

To Reverse the Gradient or Not: an Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Fooling End-to-end Speaker Verification by Adversarial Examples.

[BibT_eX]

[DOI]

CoRR, 2018

Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring.

[BibT_eX]

[DOI]

Proceedings of the 27th USENIX Security Symposium, 2018

Out-of-Distribution Detection using Multiple Semantic Label Representations.

[BibT_eX]

[DOI]

Gabi Shalev

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Fooling End-To-End Speaker Verification With Adversarial Examples.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Analysis of sentence embedding models using prediction tasks in natural language processing.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 2017

Learning Similarity Function for Pronunciation Variations.

[BibT_eX]

[DOI]

Einat Naaman

CoRR, 2017

Houdini: Fooling Deep Structured Prediction Models.

[BibT_eX]

[DOI]

CoRR, 2017

Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Automatic Measurement of Pre-Aspiration.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Learning Similarity Functions for Pronunciation Variations.

[BibT_eX]

[DOI]

Einat Naaman

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

Sequence segmentation using joint RNN and structured prediction models.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

StructED: Risk Minimization in Structured Prediction.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2016

Automatic measurement of vowel duration via structured prediction.

[BibT_eX]

[DOI]

CoRR, 2016

Automatic Measurement of Voice Onset Time and Prevoicing Using Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Vowel duration measurement using deep neural networks.

[BibT_eX]

[DOI]