Axel Röbel

CoRR, October, 2025

Continuous Audio Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

Fast-VGAN: Lightweight Voice Conversion with Explicit Control of F0 and Duration Parameters.

[BibT_eX]

[DOI]

Mathilde Abrassart

CoRR, July, 2025

MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

Audio Conditioning for Music Generation via Discrete Bottleneck Features.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis.

[BibT_eX]

[DOI]

Théodor Lemerle

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

On Strategies to Exploit Dependencies Between Singing Voice Alignment and Separation.

[BibT_eX]

[DOI]

Théo Nguyen

Yann Teytaut

Proceedings of the 32nd European Signal Processing Conference, 2024

2023

Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations.

[BibT_eX]

[DOI]

Laurent Benaroya

Entropy, February, 2023

AI (r)evolution - where are we heading? Thoughts about the future of music and sound technologies in the era of deep learning.

[BibT_eX]

[DOI]

CoRR, 2023

VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice.

[BibT_eX]

[DOI]

CoRR, 2023

Analysis and Transformation of Voice Level in Singing Voice.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet.

[BibT_eX]

[DOI]

Inf., 2022

A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice.

[BibT_eX]

[DOI]

Inf., 2022

Analysis and transformations of intensity in singing voice.

[BibT_eX]

[DOI]

CoRR, 2022

StyleWaveGAN: Style-based synthesis of drum sounds with extensive controls using generative adversarial networks.

[BibT_eX]

[DOI]

Antoine Lavault

Matthieu Voiry

CoRR, 2022

A study on constraining Connectionist Temporal Classification for temporal audio alignment.

[BibT_eX]

[DOI]

Yann Teytaut

Baptiste Bouvier

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Production Strategies of Vocal Attitudes.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

StyleWaveGAN: Style-based synthesis of drum sounds using generative adversarial networks for higher audio quality.

[BibT_eX]

[DOI]

Antoine Lavault

Matthieu Voiry

Proceedings of the 30th European Signal Processing Conference, 2022

Voice Reenactment with F0 and timing constraints and adversarial learning of conversions.

[BibT_eX]

[DOI]

Proceedings of the 30th European Signal Processing Conference, 2022

2021

Sequence-To-Sequence Voice Conversion using F0 and Time Conditioning and Adversarial Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Towards Universal Neural Vocoding with a Multi-band Excited WaveNet.

[BibT_eX]

[DOI]

CoRR, 2021

Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations.

[BibT_eX]

[DOI]

Laurent Benaroya

CoRR, 2021

Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels.

[BibT_eX]

[DOI]

Clément Le Moine Veillon

CoRR, 2021

Audio Defect Detection in Music with Deep Networks.

[BibT_eX]

[DOI]

Daniel Wolff

Rémi Mignot

Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021

Phoneme-to-Audio Alignment with Recurrent Neural Networks for Speaking and Singing Voice.

[BibT_eX]

[DOI]

Yann Teytaut

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Speaker Attentive Speech Emotion Recognition.

[BibT_eX]

[DOI]

Clément Le Moine

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels.

[BibT_eX]

[DOI]

Clément Le Moine

Proceedings of the 29th European Signal Processing Conference, 2021

2020

Realistic Transformation of Facial and Vocal Smiles in Real-Time Audiovisual Streams.

[BibT_eX]

[DOI]

Jean-Julien Aucouturier

IEEE Trans. Affect. Comput., 2020

La voix actée : pratiques, enjeux, applications (Acted voice : practices, challenges, applications).

[BibT_eX]

[DOI]

Jean-François Bonastre

Emmanuel Ethis

Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020

Sound Texture Synthesis Using RI Spectrograms.

[BibT_eX]

[DOI]

Hugo Caracalla

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

GCI Detection from Raw Speech Using a Fully-Convolutional Network.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

CycleGAN Voice Conversion of Spectral Envelopes using Adversarial Weights.

[BibT_eX]

[DOI]

Rafael Ferro

Proceedings of the 28th European Signal Processing Conference, 2020

Semi-supervised learning of glottal pulse positions in a neural analysis-synthesis framework.

[BibT_eX]

[DOI]

Proceedings of the 28th European Signal Processing Conference, 2020

2019

SoftGAN: Learning generative models efficiently with application to CycleGAN Voice Conversion.

[BibT_eX]

[DOI]

Rafael Ferro

CoRR, 2019

Sound texture synthesis using convolutional neural networks.

[BibT_eX]

[DOI]

Hugo Caracalla

CoRR, 2019

Fully-Convolutional Network for Pitch Estimation of Speech Signals.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Sequence-to-sequence Modelling of F0 for Speech Emotion Conversion.

[BibT_eX]

[DOI]

Carl Robinson

Proceedings of the IEEE International Conference on Acoustics, 2019

Data Augmentation for Drum Transcription with Convolutional Neural Networks.

[BibT_eX]

[DOI]

Céline Jacques

Proceedings of the 27th European Signal Processing Conference, 2019

Improving singing voice separation using Deep U-Net and Wave-U-Net with data augmentation.

[BibT_eX]

[DOI]

Alice Cohen-Hadria

Geoffroy Peeters

Proceedings of the 27th European Signal Processing Conference, 2019

Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 27th European Signal Processing Conference, 2019

2018

Binaural Localization of Multiple Sound Sources by Non-Negative Tensor Factorization.

[BibT_eX]

[DOI]

Elie-Laurent Benaroya

IEEE ACM Trans. Audio Speech Lang. Process., 2018

2017

A Mouth Opening Effect Based on Pole Modification for Expressive Singing Voice Transformation.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Similarity Search of Acted Voices for Automatic Voice Casting.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

A Morphological Model for Simulating Acoustic Scenes and Its Application to Sound Event Detection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Multi-Frame Amplitude Envelope Estimation for Modification of Singing Voice.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Evaluation of Singing Synthesis: Methodology and Case Study with Concatenative and Performative Systems.

[BibT_eX]

[DOI]

Lionel Feugère

Christophe d'Alessandro

Samuel Delalez

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Expressive Control of Singing Voice Synthesis Using Musical Contexts and a Parametric F0 Model.

[BibT_eX]

[DOI]

Celine Chabot-Canet

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Simple multi frame analysis methods for estimation of amplitude spectral envelope estimation in singing voice.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A source/filter model with adaptive constraints for NMF-based speech separation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

An evaluation framework for event detection using a morphological model of acoustic scenes.

[BibT_eX]

[DOI]

CoRR, 2015

On glottal source shape parameter transformation using a novel deterministic and stochastic speech analysis and synthesis system.

[BibT_eX]

[DOI]

Stefan Huber

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A multi-layer F0 model for singing voice synthesis using a b-spline representation with intuitive controls.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

On automatic drum transcription using non-negative matrix deconvolution and itakura saito divergence.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

The role of glottal source parameters for high-quality transformation of perceptual age.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

One-formant vocal tract modeling for glottal pulse shape estimation.

[BibT_eX]

[DOI]

Yu-Ren Chien

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

On the use of a spatial cue as prior information for stereo sound source separation based on spatially weighted non-negative tensor factorization.

[BibT_eX]

[DOI]

Yuki Mitsufuji

EURASIP J. Adv. Signal Process., 2014

On the use of voice descriptors for glottal source shape parameter estimation.

[BibT_eX]

[DOI]

Stefan Huber

Konstantinos Papachristou

Comput. Speech Lang., 2014

2D/3D AudioVisual content analysis & description.

[BibT_eX]

[DOI]

Ioannis Pitas

Proceedings of the IEEE 16th International Workshop on Multimedia Signal Processing, 2014

On automatic voice casting for expressive speech: Speaker recognition vs. speech classification.

[BibT_eX]

[DOI]

Grégoire Bachman

Proceedings of the IEEE International Conference on Acoustics, 2014

Online NON-negative Tensor Deconvolution for source detection in 3DTV audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

A montage approach to sound texture synthesis.

[BibT_eX]

[DOI]

Seán O'Leary

Proceedings of the 22nd European Signal Processing Conference, 2014

A Two Level Montage Approach to Sound Texture Synthesis with Treatment of Unique Events.

[BibT_eX]

[DOI]

Seán O'Leary

Proceedings of the 17th International Conference on Digital Audio Effects, 2014

2013

Automatic Adaptation of the Time-Frequency Resolution for Sound Analysis and Re-Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis.

[BibT_eX]

[DOI]

Speech Commun., 2013

Syll-O-Matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables.

[BibT_eX]

[DOI]

Francois Lamare

Proceedings of the IEEE International Conference on Acoustics, 2013

Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge.

[BibT_eX]

[DOI]

Yuki Mitsufuji

Proceedings of the IEEE International Conference on Acoustics, 2013

Spectral domain analysis, modelling and transformation of sound. (Analyse, modelisation et transformation des sons).

[BibT_eX]

[DOI]

, 2013

2012

Statistical Characterisation of Melodic Pitch Contours and its Application for Melody Extraction.

[BibT_eX]

[DOI]

Justin Salamon

Geoffroy Peeters

Proceedings of the 13th International Society for Music Information Retrieval Conference, 2012

Glottal source shape parameter estimation using phase minimization variants.

[BibT_eX]

[DOI]

Stefan Huber

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Analysis and modification of excitation source characteristics for singing voice synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Sample Orchestrator : gestion par le contenu d'échantillons sonores.

[BibT_eX]

[DOI]

Traitement du Signal, 2011

Phase Minimization for Glottal Model Estimation.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2011

Sound Analysis and Synthesis Adaptive in Time and Two Frequency Bands

[BibT_eX]

[DOI]

Marco Liuni

Péter Balázs

CoRR, 2011

A Reduced Multiple Gabor Frame for Local Time Adaptation of the Spectrogram

[BibT_eX]

[DOI]

CoRR, 2011

Drum extraction from polyphonic music based on a spectro-temporal model of percussive sounds.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Rényi information measures for spectral change detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Pitch transposition and breathiness modification using a glottal source model and its adapted vocal-tract filter.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Function of Phase-Distortion for glottal model estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Multiple Fundamental Frequency Estimation and Polyphony Inference of Polyphonic Music Signals.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2010

Dynamic Spectral Envelope Modeling for Timbre Analysis of Musical Instrument Sounds.

[BibT_eX]

[DOI]

Juan José Burred

Thomas Sikora

IEEE Trans. Speech Audio Process., 2010

Shape-invariant speech transformation with the phase vocoder.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Joint estimate of shape and time-synchronization of a glottal source model by phase flatness.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

An Entropy Based Method for Local Time-Adaptation of the Spectrogram.

[BibT_eX]

[DOI]

Proceedings of the Exploring Music Contents - 7th International Symposium, 2010

2009

MuBu and Friends - Assembling Tools for Content Based Real-Time Interactive Audio Processing in Max/MSP.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Computer Music Conference, 2009

The expected amplitude of overlapping partials of harmonic sounds.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Applying improved spectral modeling for High Quality voice conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Polyphonic musical instrument recognition based on a dynamic model of the spectral envelope.

[BibT_eX]

[DOI]

Juan José Burred

Thomas Sikora

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

Adaptive Threshold Determination for Spectral Peak Classification.

[BibT_eX]

[DOI]

Miroslav Zivanovic

Comput. Music. J., 2008

Frequency-Slope Estimation and Its Application to Parameter Estimation for Non-Stationary Sinusoids.

[BibT_eX]

[DOI]

Comput. Music. J., 2008

Using the SDIF Sound Description Interchange Format for Audio Features.

[BibT_eX]

[DOI]

Juan José Burred

Carmine Emanuele Cella

Geoffroy Peeters

Diemo Schwarz

Proceedings of the ISMIR 2008, 2008

Extending efficient spectral envelope modeling to Mel-frequency based representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

On cepstral and all-pole based spectral envelope modeling with unknown model order.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2007

Parameter estimation for linear AM/FM sinusoids using frequency domain demodulation.

[BibT_eX]

Proceedings of the Signal and Image Processing (SIP 2007), 2007

Synthesized Polyphonic Music Database with Verifiable Ground Truth for Multiple F0 Estimation.

[BibT_eX]

[DOI]

Niels Bogaards

Proceedings of the 8th International Conference on Music Information Retrieval, 2007

Speech to chant transformation with the phase vocoder.

[BibT_eX]

[DOI]

Joshua Fineberg

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

All-Pole Spectral Envelope Modelling with Order Selection for Harmonic Signals.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Adaptive additive modeling with continuous parameter trajectories.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2006

Estimation of partial parameters for non stationary sinusoids.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Computer Music Conference, 2006

Improving Lpc Spectral Envelope Extraction Of Voiced Speech By True-Envelope Estimation.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Real Time signal Transposition with envelope Preservation in the phase vocoder.

[BibT_eX]

[DOI]

Proceedings of the 2005 International Computer Music Conference, 2005

Multiple fundamental frequency estimation of polyphonic music signals.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Physical principles driven joint evaluation of multiple <i>f</i><sub>0</sub> hypotheses.

[BibT_eX]

[DOI]

Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, 2004

Signal decomposition by means of classification of spectral peaks.

[BibT_eX]

[DOI]

Miroslav Zivanovic

Proceedings of the 2004 International Computer Music Conference, 2004

Sound Analysis and Processing with AudioSculpt 2.

[BibT_eX]

[DOI]

Niels Bogaards

Proceedings of the 2004 International Computer Music Conference, 2004

A new approach to spectral peak classification.

[BibT_eX]

[DOI]

Miroslav Zivanovic

Proceedings of the 2004 12th European Signal Processing Conference, 2004

2003

Transient detection and preservation in the phase vocoder.

[BibT_eX]

[DOI]

Proceedings of the 2003 International Computer Music Conference, 2003

2002

Estimating partial frequency and frequency slope using reassignment operators.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Computer Music Conference, 2002

2001

Synthesizing Natural Sounds Using Dynamic Models of Sound Attractors.

[BibT_eX]

[DOI]

Comput. Music. J., 2001

Adaptive additive synthesis using spline based parameter trajectory models.

[BibT_eX]

[DOI]

Proceedings of the 2001 International Computer Music Conference, 2001

1999

Adaptive Additive Synthesis of Sound.

[BibT_eX]

[DOI]

Proceedings of the 1999 International Computer Music Conference, 1999

1996

Neural Network Modeling of Speech and Music Signals.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 9, 1996

1995

Neural Networks for Modeling Time Series of Musical Instruments.

[BibT_eX]

[DOI]

Proceedings of the 1995 International Computer Music Conference, 1995

1994

Dynamic pattern selection for faster learning and controlled generalization of neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2nd European Symposium on Artificial Neural Networks, 1994

1993

Neuronale Modelle nichtlinearer dynamischer Systeme mit Anwendung auf Musiksignale.

[BibT_eX]

[DOI]