Richard Dufour

Orcid: 0000-0003-1203-9108

According to our database1, Richard Dufour authored at least 108 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions.
CoRR, 2024

Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems.
CoRR, 2024

How Important Is Tokenization in French Medical Masked Language Models?
CoRR, 2024

DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain.
CoRR, 2024

Language Model Adaptation to Specialized Domains through Selective Masking based on Genre and Topical Characteristics.
CoRR, 2024

BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains.
CoRR, 2024

2023
A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks.
CoRR, 2023

Text revision in Scientific Writing Assistance: An Overview.
CoRR, 2023

HATS: An Open Data Set Integrating Human Perception Applied to the Evaluation of Automatic Speech Recognition Metrics.
Proceedings of the Text, Speech, and Dialogue - 26th International Conference, 2023

HATS : Un jeu de données intégrant la perception humaine appliquée à l'évaluation des métriques de transcription de la parole.
Proceedings of the Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles, TALN 2023, 2023

MORFITT : Un corpus multi-labels d'articles scientifiques français dans le domaine biomédical.
Proceedings of the Actes de CORIA-TALN 2023. Actes de l'atelier "Analyse et Recherche de Textes Scientifiques", 2023

DrBERT: Un modèle robuste pré-entraîné en français pour les domaines biomédical et clinique.
Proceedings of the Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles, TALN 2023, 2023

CASIMIR : un Corpus d'Articles Scientifiques Intégrant les ModIfications et Révisions des auteurs.
Proceedings of the Actes de CORIA-TALN 2023. Actes de l'atelier "Analyse et Recherche de Textes Scientifiques", 2023

Projet NaviTerm : navigation terminologique pour une montée en compétence rapide et personnalisée sur un domaine de recherche.
Proceedings of the Actes de CORIA-TALN 2023. Actes de l'atelier "Analyse et Recherche de Textes Scientifiques", 2023

Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Text revision in Scientific Writing Assistance: A Review.
Proceedings of the 13th International Workshop on Bibliometric-enhanced Information Retrieval co-located with 45th European Conference on Information Retrieval (ECIR 2023), 2023

DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

The Role of Global and Local Context in Named Entity Recognition.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

2022
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations.
CoRR, 2022

Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations.
Database J. Biol. Databases Curation, 2022

ANTILLES: An Open French Linguistically Enriched Part-of-Speech Corpus.
Proceedings of the Text, Speech, and Dialogue - 25th International Conference, 2022

Mesures linguistiques automatiques pour l'évaluation des systèmes de Reconnaissance Automatique de la Parole (Automated linguistic measures for automatic speech recognition systems' evaluation).
Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2022

Remplacement de mentions pour l'adaptation d'un corpus de reconnaissance d'entités nommées à un domaine cible (Mention replacement for adapting a named entity recognition dataset to a target domain).
Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2022

Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition.
Proceedings of the Interspeech 2022, 2022

Data Augmentation for Robust Character Detection in Fantasy Novels.
Proceedings of the Workshop on Computational Methods in the Humanities 2022, 2022

FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain.
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis, 2022

2021
Graph Embeddings for Abusive Language Detection.
SN Comput. Sci., 2021

Influence of Speaker Pre-training on Character Voice Representation.
Proceedings of the Speech and Computer - 23rd International Conference, 2021

Assessing Speaker-Independent Character Information for Acted Voices.
Proceedings of the Speech and Computer - 23rd International Conference, 2021

2020
La voix actée : pratiques, enjeux, applications (Acted voice : practices, challenges, applications).
Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020

Apprentissage automatique de représentation de voix à l'aide d'une distillation de la connaissance pour le casting vocal (Learning voice representation using knowledge distillation for automatic voice casting ).
Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020

Tuning Graph2vec with Node Labels for Abuse Detection in Online Conversations (extended abstract).
Proceedings of MARAMI 2020 - Modèles & Analyse des Réseaux : Approches Mathématiques & Informatiques, 2020

A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Learning Voice Representation Using Knowledge Distillation for Automatic Voice Casting.
Proceedings of the Interspeech 2020, 2020

Review of different robust x-vector extractors for speaker verification.
Proceedings of the 28th European Signal Processing Conference, 2020

Traitement Automatique du Langage : Études et apports aux frontières de l'interdisciplinarité.
, 2020

2019
Conversational Networks for Automatic Online Moderation.
IEEE Trans. Comput. Soc. Syst., 2019

Abusive Language Detection in Online Conversations by Combining Content- and Graph-Based Features.
Frontiers Big Data, 2019

Qualitative Evaluation of ASR Adaptation in a Lecture Context: Application to the PASTEL Corpus.
Proceedings of the Interspeech 2019, 2019

Similarity Metric Based on Siamese Neural Networks for Voice Casting.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
LIA@CLEF 2018: Mining Events Opinion Argumentation from Raw Unlabeled Twitter Data using Convolutional Neural Network.
Proceedings of the Working Notes of CLEF 2018, 2018

2017
Denoised Bottleneck Features From Deep Autoencoders for Telephone Conversation Analysis.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Systèmes du LIA à DEFT'13.
CoRR, 2017

Exploring Temporal Analysis of Tweet Content from Cultural Events.
Proceedings of the Statistical Language and Speech Processing, 2017

Graph-Based Features for Automatic Online Abuse Detection.
Proceedings of the Statistical Language and Speech Processing, 2017

Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization.
Proceedings of the Interspeech 2017, 2017

Detection of abusive messages in an on-line community.
Proceedings of the COnférence en Recherche d'Informations et Applications, 2017

Impact of Content Features for Automatic Online Abuse Detection.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2017

2016
Automatic Text Summarization Approaches to Speed up Topic Model Learning Process.
Int. J. Comput. Linguistics Appl., 2016

Impact of Word Error Rate on theme identification task of highly imperfect human-human conversations.
Comput. Speech Lang., 2016

Auto-encodeurs pour la compréhension de documents parlés (Auto-encoders for Spoken Document Understanding).
Proceedings of the Actes de la conférence conjointe JEP-TALN-RECITAL 2016. Volume 1 : JEP, 2016

Un Corpus de Flux TV Annotés pour la Prédiction de Genres (A Genre Annotated Corpus of French Multi-channel TV Streams for Genre Prediction).
Proceedings of the Actes de la conférence conjointe JEP-TALN-RECITAL 2016. Volume 1 : JEP, 2016

Un Sous-espace Thématique Latent pour la Compréhension du Langage Parlé (A Latent Topic-based Subspace for Spoken Language Understanding).
Proceedings of the Actes de la conférence conjointe JEP-TALN-RECITAL 2016. Volume 1 : JEP, 2016

Quaternion Neural Networks for Spoken Language Understanding.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

A log-linear weighting approach in the Word2vec space for spoken language understanding.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Tracking dialog states using an Author-Topic based representation.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Parallel Long Short-Term Memory for multi-stream classification.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Improving multi-stream classification by mapping sequence-embedding in a high dimensional space.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Spoken Language Understanding in a Latent Topic-Based Subspace.
Proceedings of the Interspeech 2016, 2016

Deep Stacked Autoencoders for Spoken Language Understanding.
Proceedings of the Interspeech 2016, 2016

Réseaux de neurones pour la représentation des contextes continus des mots.
Proceedings of the CORIA 2016 - Conférence en Recherche d'Informations et Applications, 2016

2015
Compact Multiview Representation of Documents Based on the Total Variability Space.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Initialisation de Réseaux de Neurones à l'aide d'un Espace Thématique.
Proceedings of the Actes de la 22e conference sur le Traitement Automatique des Langues Naturelles. Articles courts, 2015

Apport de l'information temporelle des contextes pour la représentation vectorielle continue des mots.
Proceedings of the Actes de la 22e conference sur le Traitement Automatique des Langues Naturelles. Articles courts, 2015

A comparison of normalization techniques applied to latent space representations for speech analytics.
Proceedings of the INTERSPEECH 2015, 2015

Identification de personnes dans des flux multimédia.
Proceedings of the CORIA 2015 - Conférence en Recherche d'Infomations et Applications, 2015

Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2015

Topic-space based setup of a neural network for theme identification of highly imperfect transcriptions.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Integration of Word and Semantic Features for Theme Identification in Telephone Conversations.
Proceedings of the Natural Language Dialog Systems and Intelligent Assistants, 2015

2014
Characterizing and detecting spontaneous speech: Application to speaker role recognition.
Speech Commun., 2014

Feature selection using Principal Component Analysis for massive retweet detection.
Pattern Recognit. Lett., 2014

SuMACC Project's Corpus - A Topic-Based Query Extension Approach to Retrieve Multimedia Documents.
Proceedings of the Text, Speech and Dialogue - 17th International Conference, 2014

Author-topic based representation of call-center conversations.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Getting by with a Little Help from the Crowd: Practical Approaches to Social Image Labeling.
Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia, 2014

Characterizing and Predicting Bursty Events: The Buzz Case Study on Twitter.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

A LDA-based Topic Classification Approach from highly Imperfect Automatic Transcriptions.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

A topic-based approach for post-processing correction of automatic translations.
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2014, 2014

A Combined Thematic and Acoustic Approach for a Music Recommendation Service in TV Commercials.
Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014

Theme identification in human-human conversations with features from specific speaker type hidden spaces.
Proceedings of the INTERSPEECH 2014, 2014

I-vector based representation of highly imperfect automatic transcriptions.
Proceedings of the INTERSPEECH 2014, 2014

Factor analysis based semantic variability compensation for automatic conversation representation.
Proceedings of the INTERSPEECH 2014, 2014

Subspace Gaussian mixture models for dialogues classification.
Proceedings of the INTERSPEECH 2014, 2014


Improving dialogue classification using a topic space representation and a Gaussian classifier based on the decision rule.
Proceedings of the IEEE International Conference on Acoustics, 2014

An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

2013
LIA @ MediaEval 2013 MusiClef Task: A Combined Thematic and Acoustic Approach.
Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

LIA @ MediaEval 2013 Crowdsourcing Task: Metadata or not Metadata? That is a Fashion Question.
Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

LIA @ MediaEval 2013 Spoken Web Search Task: An I-Vector based Approach.
Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

Person name spotting by combining acoustic matching and LDA topic models.
Proceedings of the INTERSPEECH 2013, 2013


Combining acoustic name spotting and continuous context models to improve spoken person name recognition in speech.
Proceedings of the INTERSPEECH 2013, 2013

Person name recognition in ASR outputs using continuous context models.
Proceedings of the IEEE International Conference on Acoustics, 2013


Event detection from image hosting services by slightly-supervised multi-span context models.
Proceedings of the 11th International Workshop on Content-Based Multimedia Indexing, 2013

2012
Combinaison d'approches pour la reconnaissance du rôle des locuteurs (Combination of approaches for speaker role recognition) [in French].
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, 2012

Détection et caractérisation des régions d'erreurs dans des transcriptions de contenus multimédia : application à la recherche des noms de personnes (Error region detection and characterization in transcriptions of multimedia documents : application to person name search) [in French].
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, 2012

Automatic transcription error recovery for Person Name Recognition.
Proceedings of the INTERSPEECH 2012, 2012

Automatic error region detection and characterization in LVCSR transcriptions of TV news shows.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Investigation of Spontaneous Speech Characterization Applied to Speaker Role Recognition.
Proceedings of the INTERSPEECH 2011, 2011

2010
Transcription automatique de la parole spontanée. (Automatic transcription of spontaneous speech).
PhD thesis, 2010

Automatic indexing of speech segments with spontaneity levels on large audio database.
Proceedings of the 2010 International Workshop on Searching Spontaneous Conversational Speech, 2010

A language-identification inspired method for spontaneous speech detection.
Proceedings of the INTERSPEECH 2010, 2010

Semi-supervised part-of-speech tagging in speech applications.
Proceedings of the INTERSPEECH 2010, 2010

Unsupervised model adaptation on targeted speech segments for LVCSR system combination.
Proceedings of the INTERSPEECH 2010, 2010

2009
Local and global models for spontaneous speech segment detection and characterization.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
From prepared speech to spontaneous speech recognition system: a comparative study applied to French language.
Proceedings of the CSTST 2008: Proceedings of the 5th International Conference on Soft Computing as Transdisciplinary Science and Technology, 2008

Correcting asr outputs: Specific solutions to specific errors in French.
Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008


  Loading...