Florian Metze

According to our database1, Florian Metze authored at least 173 papers between 1996 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2019
Joint embeddings with multimodal cues for video-text retrieval.
IJMIR, 2019

Effective Dimensionality Reduction for Word Embeddings.
Proceedings of the 4th Workshop on Representation Learning for NLP, 2019

Acoustic-to-Word Models with Conversational Context Information.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Learned in Speech Recognition: Contextual Acoustic Word Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2019

Learning from Multiview Correlations in Open-domain Videos.
Proceedings of the IEEE International Conference on Acoustics, 2019

Phoneme Level Language Models for Sequence Based Low Resource ASR.
Proceedings of the IEEE International Conference on Acoustics, 2019

Multimodal Grounding for Sequence-to-sequence Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Connectionist Temporal Localization for Sound Event Detection with Sequential Labeling.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling.
Proceedings of the IEEE International Conference on Acoustics, 2019

Multimodal Abstractive Summarization for How2 Videos.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Hierarchical Multitask Learning With CTC.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Acoustic-to-Word Recognition with Sequence-to-Sequence Models.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Dialog-Context Aware end-to-end Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Domain Robust Feature Extraction for Rapid Low Resource ASR Development.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

Eyes and Ears Together: New Task for Multimodal Spoken Content Analysis.
Proceedings of the Working Notes Proceedings of the MediaEval 2018 Workshop, 2018

Annotating High-Level Structures of Short Stories and Personal Anecdotes.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Subword and Crossword Units for CTC Acoustic Models.
Proceedings of the Interspeech 2018, 2018

Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks.
Proceedings of the Interspeech 2018, 2018

Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection.
Proceedings of the Interspeech 2018, 2018

The ACLEW DiViMe: An Easy-to-use Diarization Tool.
Proceedings of the Interspeech 2018, 2018

Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Enhancement and Analysis of Conversational Speech: JSALT 2017.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-end Multimodal Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequence-Based Multi-Lingual Low Resource Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Comparison of Decoding Strategies for CTC Acoustic Models.
Proceedings of the Interspeech 2017, 2017

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification.
Proceedings of the Interspeech 2017, 2017

A first attempt at polyphonic sound event detection using connectionist temporal classification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A comparison of Deep Learning methods for environmental sound detection.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Visual features for context-aware speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Toolkits for Robust Speech Processing.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Preliminaries.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

End-to-End Architectures for Speech Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
The effects of automatic speech recognition quality on human transcription latency.
Proceedings of the 13th Web for All Conference, 2016

Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection.
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach.
Proceedings of the Interspeech 2016, 2016

Virtual Machines and Containers as a Platform for Experimentation.
Proceedings of the Interspeech 2016, 2016

Manipulating Word Lattices to Incorporate Human Corrections.
Proceedings of the Interspeech 2016, 2016

Experiences with Shared Resources for Research and Education in Speech and Language Processing.
Proceedings of the Interspeech 2016, 2016

Audio-based multimedia event detection using deep recurrent neural networks.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

An empirical exploration of CTC acoustic models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Training Deep Neural Networks for Reverberation Robust Speech Recognition.
Proceedings of the 12. ITG Symposium on Speech Communication, 2016

2015
Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors.
IEEE/ACM Trans. Audio, Speech & Language Processing, 2015

Query by Example Search on Speech at Mediaeval 2015.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

On speaker adaptation of long short-term memory recurrent neural networks.
Proceedings of the INTERSPEECH 2015, 2015

Distance-aware DNNs for robust speech recognition.
Proceedings of the INTERSPEECH 2015, 2015

The speech recognition virtual kitchen turns one.
Proceedings of the INTERSPEECH 2015, 2015

Using keyword spotting to help humans correct captioning faster.
Proceedings of the INTERSPEECH 2015, 2015

Regularizing DNN acoustic models with Gaussian stochastic neurons.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Semi-supervised training in low-resource ASR and KWS.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

QUESST2014: Evaluating Query-by-Example Speech Search in a zero-resource setting with real-life queries.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Language independent search in MediaEval's Spoken Web Search task.
Computer Speech & Language, 2014

Enabling the Rapid Development and Adoption of Speech-User Interfaces.
IEEE Computer, 2014

Query-by-example spoken term detection evaluation on low-resource languages.
Proceedings of the 4th Workshop on Spoken Language Technologies for Under-resourced Languages, 2014

EM-based phoneme confusion matrix generation for low-resource spoken term detection.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

A keyword search system using open source software.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Improvements to speaker adaptive training of deep neural networks.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

A methodology for using crowdsourced data to measure uncertainty in natural speech.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Query by Example Search on Speech at Mediaeval 2014.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Word-based probabilistic phonetic retrieval for low-resource spoken term detection.
Proceedings of the INTERSPEECH 2014, 2014

An in-depth comparison of keyword specific thresholding and sum-to-one score normalization.
Proceedings of the INTERSPEECH 2014, 2014

The speech recognition virtual kitchen: launch party.
Proceedings of the INTERSPEECH 2014, 2014

Towards speaker adaptive training of deep neural network acoustic models.
Proceedings of the INTERSPEECH 2014, 2014

Distributed learning of multilingual DNN feature extractors using GPUs.
Proceedings of the INTERSPEECH 2014, 2014

Improving language-universal feature extraction with deep maxout and convolutional neural networks.
Proceedings of the INTERSPEECH 2014, 2014

Neural network language models for low resource languages.
Proceedings of the INTERSPEECH 2014, 2014

Query-by-example spoken term detection on multilingual unconstrained speech.
Proceedings of the INTERSPEECH 2014, 2014

Improved audio features for large-scale multimedia event detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Semi-automatic audio semantic concept discovery for multimedia retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2014

Exploring audio semantic concepts for event-based video retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2014

Optimization of Neural Network Language Models for keyword search.
Proceedings of the IEEE International Conference on Acoustics, 2014

Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation.
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

Semantics for Large-Scale Multimedia: New Challenges for NLP.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

2013
Beyond audio and video retrieval: topic-oriented multimedia summarization.
IJMIR, 2013

The Spoken Web Search Task.
Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

Robust audio-codebooks for large-scale event detection in consumer videos.
Proceedings of the INTERSPEECH 2013, 2013

Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training.
Proceedings of the INTERSPEECH 2013, 2013

The speech recognition virtual kitchen.
Proceedings of the INTERSPEECH 2013, 2013

Formalizing expert knowledge for developing accurate speech recognizers.
Proceedings of the INTERSPEECH 2013, 2013

Multi-layer mutually reinforced random walk with hidden parameters for improved multi-party meeting summarization.
Proceedings of the INTERSPEECH 2013, 2013

Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk.
Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013

Identification and modeling of word fragments in spontaneous speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Learning discriminative basis coefficients for eigenspace MLLR unsupervised adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2013

Subspace mixture model for low-resource speech recognition in cross-lingual settings.
Proceedings of the IEEE International Conference on Acoustics, 2013

The spoken web search task at MediaEval 2012.
Proceedings of the IEEE International Conference on Acoustics, 2013


Extracting deep bottleneck features using stacked auto-encoders.
Proceedings of the IEEE International Conference on Acoustics, 2013

Neighbour selection and adaptation for rapid speaker-dependent ASR.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Deep maxout networks for low-resource speech recognition.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Models of tone for tonal and non-tonal languages.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

DNN acoustic modeling with modular multi-lingual feature extraction networks.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Using web text to improve keyword spotting in speech.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
Subword Modeling for Automatic Speech Recognition: Past, Present, and Emerging Approaches.
IEEE Signal Process. Mag., 2012

Integration of language identification into a recognition system for spoken conversations containing code-Switches.
Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

Multilingual bottle-neck features and its application for under-resourced languages.
Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

Active learning for accent adaptation in Automatic Speech Recognition.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Two-layer mutually reinforced random walk for improved multi-party meeting summarization.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Intra-Speaker Topic Modeling for Improved Multi-Party Meeting Summarization with Integrated Random Walk.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

AMVA'12: ACM international workshop on audio and multimedia methods for large-scale video analysis.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Semi-supervised learning for speech recognition in the context of accent adaptation.
Proceedings of the 2012 Symposium on Machine Learning in Speech and Language Processing, 2012

Beyond audio and video retrieval: towards multimedia summarization.
Proceedings of the International Conference on Multimedia Retrieval, 2012

The Spoken Web Search Task.
Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Initialization Schemes for Multilayer Perceptron Training and their Impact on ASR Performance using Multilingual Data.
Proceedings of the INTERSPEECH 2012, 2012

On Speaker-Independent Personality Perception and Prediction from Speech.
Proceedings of the INTERSPEECH 2012, 2012

Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition.
Proceedings of the INTERSPEECH 2012, 2012

The Speech Recognition Virtual Kitchen: An Initial Prototype.
Proceedings of the INTERSPEECH 2012, 2012

Event-based Video Retrieval Using Audio.
Proceedings of the INTERSPEECH 2012, 2012

Integrating Intra-Speaker Topic Modeling and Temporal-Based Inter-Speaker Topic Modeling in Random Walk for Improved Multi-Party Meeting Summarization.
Proceedings of the INTERSPEECH 2012, 2012

Generating Natural Language Summaries for Multimedia.
Proceedings of the INLG 2012 - Proceedings of the Seventh International Natural Language Generation Conference, 30 May 2012, 2012

The Spoken Web Search Task at MediaEval 2011.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Articulatory features for expressive speech synthesis.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Anger recognition in speech using acoustic and linguistic cues.
Speech Communication, 2011

Spoken Web Search.
Proceedings of the Working Notes Proceedings of the MediaEval 2011 Workshop, 2011

Modeling Speaker Personality Using Voice.
Proceedings of the INTERSPEECH 2011, 2011

Analysis of Dialectal Influence in Pan-Arabic ASR.
Proceedings of the INTERSPEECH 2011, 2011

A Review of Personality in Voice-Based Man Machine Interaction.
Proceedings of the Human-Computer Interaction. Interaction Techniques and Environments, 2011

Salient Features for Anger Recognition in German and English IVR Portals.
Proceedings of the Spoken Dialogue Systems Technology and Design, 2011

2010
Informedia @ TRECVID2010.
Proceedings of the TRECVID 2010 workshop participants notebook papers, 2010

Automatically assessing acoustic manifestations of personality in speech.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Automatically Assessing Personality from Speech.
Proceedings of the 4th IEEE International Conference on Semantic Computing (ICSC 2010), 2010

Multimedia content with a speech track: ACM multimedia 2010 workshop on searching spontaneous conversational speech.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Analysis of gender normalization using MLP and VTLN features.
Proceedings of the INTERSPEECH 2010, 2010

The 2010 CMU GALE speech-to-text system.
Proceedings of the INTERSPEECH 2010, 2010

Emotion recognition using imperfect speech recognition.
Proceedings of the INTERSPEECH 2010, 2010

Improvements to generalized discriminative feature transformation for speech recognition.
Proceedings of the INTERSPEECH 2010, 2010

Late fusion of individual engines for improved recognition of negative emotion in speech - learning vs. democratic vote.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Getting closer: tailored human-computer speech dialog.
Universal Access in the Information Society, 2009

Fusion of Acoustic and Linguistic Features for Emotion Detection.
Proceedings of the 3rd IEEE International Conference on Semantic Computing (ICSC 2009), 2009

Usability-Evaluation multimodaler Schnittstellen: Ist das Ganze die Summe seiner Teile?
Proceedings of the Mensch & Computer 2009: Grenzenlos frei!?, 2009

Benutzerstudien zur Bewertung multimodaler, interaktiver Anzeigetafeln in unterschiedlichen Entwicklungsstufen.
Proceedings of the Workshop-Proceedings der Tagung Mensch & Computer 2009, 2009

Digital Signage mit Interaktiven Displays.
Proceedings of the Workshop-Proceedings der Tagung Mensch & Computer 2009, 2009

Predicting the quality of multimodal systems based on judgments of single modalities.
Proceedings of the INTERSPEECH 2009, 2009

Influence of training on direct and indirect measures for the evaluation of multimodal systems.
Proceedings of the INTERSPEECH 2009, 2009

Emotion classification in children's speech using fusion of acoustic and linguistic features.
Proceedings of the INTERSPEECH 2009, 2009

Detecting real life anger.
Proceedings of the IEEE International Conference on Acoustics, 2009

Usability Evaluation of Multimodal Interfaces: Is the Whole the Sum of Its Parts?
Proceedings of the Human-Computer Interaction. Novel Interaction Methods and Techniques, 2009

Reliable Evaluation of Multimodal Dialogue Systems.
Proceedings of the Human-Computer Interaction. Novel Interaction Methods and Techniques, 2009

2008
User perception of multi-modal interfaces for mobile applications.
Proceedings of the INTERSPEECH 2008, 2008

Detecting trends in social bookmarking systems using a probabilistic generative model and smoothing.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Tailoring Taxonomies for Efficient Text Categorization and Expert Finding.
Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology, 2008

2007
Discriminative speaker adaptation using articulatory features.
Speech Communication, 2007

An intelligent knowledge sharing system for web communities.
Proceedings of the IEEE International Conference on Systems, 2007

The "Spree" Expert Finding System.
Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), 2007

On using Articulatory Features for Discriminative Speaker Adaptation.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications.
Proceedings of the IEEE International Conference on Acoustics, 2007

Spotting using Durational Entropy.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Articulatory features for "meeting" speech recognition.
Proceedings of the INTERSPEECH 2006, 2006

2005
Articulatory features for conversational speech recognition.
PhD thesis, 2005


Automatically Transcribing Meetings using Distant Microphones.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Issues in meeting transcription - the ISL meeting transcription system.
Proceedings of the INTERSPEECH 2004, 2004

The 2003 ISL rich transcription system for conversational telephony speech.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit.
Proceedings of the Pattern Recognition, 26th DAGM Symposium, August 30, 2004

2003
Integrating multilingual articulatory features into speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

The NESPOLE! voIP multilingual corpora in tourism and medical domains.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Multilingual articulatory features.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Compensating for hyperarticulation by modeling articulatory properties.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

A flexible stream architecture for ASR using articulatory features.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Efficient language model lookahead through polymorphic linguistic context assignment.
Proceedings of the IEEE International Conference on Acoustics, 2002

A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System.
Proceedings of the Workshop on Speech-to-Speech Translation: Algorithms and Systems@ACL 2002, 2002

2001
Advances in meeting recognition.
Proceedings of the First International Conference on Human Language Technology Research, 2001

Speech recognition over netmeeting connections.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

The nespole! voIP dialogue database.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Advances in automatic meeting record creation and access.
Proceedings of the IEEE International Conference on Acoustics, 2001

The ISL evaluation system for Verbmobil-II.
Proceedings of the IEEE International Conference on Acoustics, 2001

Speaker compensation with sine-log all-pass transforms.
Proceedings of the IEEE International Conference on Acoustics, 2001

2000
Generalized radial basis function networks for classification and novelty detection: self-organization of optimal Bayesian decision.
Neural Networks, 2000

Das View4You- System: End-to-End Evaluation.
Proceedings of the KONVENS 2000 / Sprachkommunikation, 2000

Confidence measure based language identification.
Proceedings of the IEEE International Conference on Acoustics, 2000

1996
Indeterminateness in Qualitative and Quantitative Reasoning.
Proceedings of the Seventh International Workshop on Database and Expert Systems Applications, 1996


  Loading...