Felix Weninger

Rosalind W. Picard

CoRR, 2020

Semi-Supervised Learning with Data Augmentation for End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dyadic Speech-based Affect Recognition using DAMI-P2C Parent-child Multimodal Interaction Dataset.

[BibT_eX]

[DOI]

Proceedings of the ICMI '20: International Conference on Multimodal Interaction, 2020

2019

Affective and behavioural computing: Lessons learnt from the First Computational Paralinguistics Challenge.

[BibT_eX]

[DOI]

Klaus R. Scherer

Mohamed Chetouani

Marcello Mortillaro

Comput. Speech Lang., 2019

Deep Learning Based Mandarin Accent Identification for Accent Robust ASR.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Online Batch Normalization Adaptation for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Three recent trends in Paralinguistics on the way to omniscient machine intelligence.

[BibT_eX]

[DOI]

J. Multimodal User Interfaces, 2018

2017

A Paralinguistic Approach To Speaker Diarisation: Using Age, Gender, Voice Likability and Personality Traits.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

Cross-Domain Classification of Drowsiness in Speech: The Case of Alcohol Intoxication and Sleep Deprivation.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Towards intoxicated speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Multi-task deep neural network with shared hidden layers: Breaking down the wall between emotion representations.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Route and Stopping Intent Prediction at Intersections From Car Fleet Data.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Veh., 2016

Sincerity and Deception in Speech: Two Sides of the Same Coin? A Transfer- and Multi-Task Learning Perspective.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Discriminatively Trained Recurrent Neural Networks for Continuous Dimensional Emotion Recognition from Audio.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Language proficiency assessment of English L2 speakers based on joint analysis of prosody and native language.

[BibT_eX]

[DOI]

Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

2015

Intelligent Single-Channel Methods for Multi-Source Audio Analysis.

[BibT_eX]

[DOI]

PhD thesis, 2015

Introducing CURRENNT: the munich open-source CUDA recurrent neural network toolkit.

[BibT_eX]

[DOI]

Johannes Bergmann

J. Mach. Learn. Res., 2015

A Survey on perceived speaker traits: Personality, likability, pathology, and the first challenge.

[BibT_eX]

[DOI]

Juan Rafael Orozco-Arroyave

Comput. Speech Lang., 2015

The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, parkinson's & eating condition.

[BibT_eX]

[DOI]

Elmar Nöth

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Non-linear prediction with LSTM recurrent neural networks for acoustic novelty detection.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Joint Conference on Neural Networks, 2015

Deep NMF for speech separation.

[BibT_eX]

[DOI]

Jonathan Le Roux

John R. Hershey

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR.

[BibT_eX]

[DOI]

Proceedings of the Latent Variable Analysis and Signal Separation, 2015

2014

Memory-Enhanced Neural Networks and NMF for Robust ASR.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2014

Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2014

Medium-term speaker states - A review on intoxication, sleepiness and the first challenge.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2014

A Broadcast News Corpus for Evaluation and Tuning of German LVCSR Systems.

[BibT_eX]

[DOI]

CoRR, 2014

Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures.

[BibT_eX]

[DOI]

John R. Hershey

Jonathan Le Roux

CoRR, 2014

On-Line NMF-Based Stereo Up-Mixing of Speech Improves Perceived Reduction of Non-Stationary Noise.

[BibT_eX]

[DOI]

Proceedings of the AES International Conference on Semantic Audio 2014, 2014

Emotional Analysis of Music: A Comparison of Methods.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

The Munich LSTM-RNN Approach to the MediaEval 2014 "Emotion in Music'" Task.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Discriminative NMF and its application to single-channel source separation.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

On-line continuous-time music mood regression with deep recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Single-channel speech separation with memory-enhanced recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Discriminatively trained recurrent neural networks for single-channel speech separation.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

2013

Serious Gaming for Behavior Change: The State of Play.

[BibT_eX]

[DOI]

IEEE Pervasive Comput., 2013

Words that Fascinate the Listener: Predicting Affective Ratings of On-Line Lectures.

[BibT_eX]

[DOI]

Pascal Staudt

Int. J. Distance Educ. Technol., 2013

YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context.

[BibT_eX]

[DOI]

Louis-Philippe Morency

IEEE Intell. Syst., 2013

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2013

Likability of human voices: A feature analysis and a neural network regression approach to automatic likability estimation.

[BibT_eX]

[DOI]

Proceedings of the 14th International Workshop on Image Analysis for Multimedia Interactive Services, 2013

Recent developments in openSMILE, the munich open-source multimedia feature extractor.

[BibT_eX]

[DOI]

Proceedings of the ACM Multimedia Conference, 2013

The TUM Approach to the MediaEval Music Emotion Task Using Generic Affective Audio Features.

[BibT_eX]

[DOI]

Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism.

[BibT_eX]

[DOI]

Stefan Steidl

Anton Batliner

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Affect recognition in real-life acoustic conditions - a new perspective on feature selection.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Influence of Low-Level Features Extracted from Rhythmic and Harmonic Sections on Music Genre Classification.

[BibT_eX]

[DOI]

Proceedings of the Man-Machine Interactions 3, 2013

The acoustics of eye contact: detecting visual attention from conversational audio cues.

[BibT_eX]

[DOI]

Proceedings of the 6th workshop on Eye gaze in intelligent human machine interaction: gaze in multimodal interaction, 2013

Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Speaker trait characterization in web videos: Uniting speech, language, and facial features.

[BibT_eX]

[DOI]

Louis-Philippe Morency

Proceedings of the IEEE International Conference on Acoustics, 2013

A discriminative approach to polyphonic piano note transcription using supervised non-negative matrix factorization.

[BibT_eX]

[DOI]

Christian Kirst

Hans-Joachim Bungartz

Proceedings of the IEEE International Conference on Acoustics, 2013

A comparative study on sparsity penalties for NMF-based speech separation: Beyond LP-norms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Integrating noise estimation and factorization-based speech separation: A novel hybrid approach.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Optimization and Parallelization of Monaural Source Separation Algorithms in the openBliSSART Toolkit.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2012

The Voice of Leadership: Models and Performances of Automatic Analysis in Online Speeches.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2012

Synthesized speech for model training in cross-corpus recognition of human emotion.

[BibT_eX]

[DOI]

Int. J. Speech Technol., 2012

Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature Sets.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Towards distributed recognition of emotion from speech.

[BibT_eX]

[DOI]

Proceedings of the 5th International Symposium on Communications, 2012

Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise.

[BibT_eX]

[DOI]

Martin Wöllmer

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Discrimination of Linguistic and Non-Linguistic Vocalizations in Spontaneous Speech: Intra- and Inter-Corpus Perspectives.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Improving Recognition of Speaker States and Traits by Cumulative Evidence: Intoxication, Sleepiness, Age and Gender.

[BibT_eX]

[DOI]

Erik Marchi

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

The INTERSPEECH 2012 Speaker Trait Challenge.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Non-negative matrix factorization for highly noise-robust ASR: To enhance or to recognize?

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Supervised and semi-supervised suppression of background music in monaural speech recordings.

[BibT_eX]

[DOI]

Jordi Feliu

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Robust feature extraction for automatic recognition of vibrato singing in recorded polyphonic music.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Automatic recognition of emotion evoked by general sound events.

[BibT_eX]

[DOI]

Shrikanth S. Narayanan

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Real-Time Speech Separation by Semi-supervised Nonnegative Matrix Factorization.

[BibT_eX]

[DOI]

Proceedings of the Latent Variable Analysis and Signal Separation, 2012

Music Information Retrieval: An Inspirational Guide to Transfer from Related Disciplines.

[BibT_eX]

[DOI]

Proceedings of the Multimodal Music Processing, 2012

Towards Automatic Intoxication Detection from Speech in Real-Life Acoustic Environments.

[BibT_eX]

[DOI]

Zixing Zhang

Proceedings of the 10th ITG Conference on Speech Communication, 2012

Fully Automatic Audiovisual Emotion Recognition: Voice, Words, and the Face.

[BibT_eX]

[DOI]

Proceedings of the 10th ITG Conference on Speech Communication, 2012

Sparse, Hierarchical and Semi-Supervised Base Learning for Monaural Enhancement of Conversational Speech.

[BibT_eX]

[DOI]

Martin Wöllmer

Proceedings of the 10th ITG Conference on Speech Communication, 2012

2011

Computational Assessment of Interest in Speech - Facing the Real-Life Challenge.

[BibT_eX]

[DOI]

Künstliche Intell., 2011

Recognition of Nonprototypical Emotions in Reverberated and Noisy Speech by Nonnegative Matrix Factorization.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2011

Automatic Assessment of Singer Traits in Popular Music: Gender, Age, Height and Race.

[BibT_eX]

[DOI]

Martin Wöllmer

Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011

Multi-Modal Non-Prototypical Music Mood Analysis in Continuous Space: Reliability and Performances.

[BibT_eX]

[DOI]

Johannes Dorfner

Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011

Speech-Based Non-Prototypical Affect Recognition for Child-Robot Interaction in Reverberated Environments.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Acoustic-Linguistic Recognition of Interest in Speech with Bottleneck-BLSTM Nets.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Using Multiple Databases for Training in Emotion Recognition: To Unite or to Vote?

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

"Would You Buy a Car from Me?" - On the Likability of Telephone Voices.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Localization of non-linguistic events in spontaneous speech by Non-Negative Matrix Factorization and Long Short-Term Memory.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

OpenBliSSART: Design and evaluation of a research toolkit for Blind Source Separation in Audio Recognition Tasks.

[BibT_eX]

[DOI]

Alexander Lehmann

Proceedings of the IEEE International Conference on Acoustics, 2011

Combining monaural source separation with Long Short-Term Memory for increased robustness in vocalist gender recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Ten Recent Trends in Computational Paralinguistics.

[BibT_eX]

[DOI]

Proceedings of the Cognitive Behavioural Systems, 2011

Unsupervised learning in cross-corpus acoustic emotion recognition.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Vocalist Gender Recognition in Recorded Popular Music.

[BibT_eX]

[DOI]

Proceedings of the 11th International Society for Music Information Retrieval Conference, 2010

Non-negative matrix factorization as noise-robust feature extractor for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Discrimination of speech and non-linguistic vocalizations by Non-Negative Matrix Factorization.

[BibT_eX]

[DOI]