Benoît Sagot

According to our database1, Benoît Sagot authored at least 190 papers between 2004 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation.
CoRR, August, 2025

A French Version of the OLDI Seed Corpus.
CoRR, August, 2025

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance.
CoRR, April, 2025

BigO(Bench) - Can LLMs Generate Code with Controlled Time and Space Complexity?
CoRR, March, 2025

Explicit Learning and the LLM in Machine Translation.
CoRR, March, 2025

KréyoLID From Language Identification Towards Language Mining.
CoRR, March, 2025

Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation.
CoRR, March, 2025

Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression.
CoRR, March, 2025

In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Towards Zero-Shot Multimodal Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Diachronic Document Dataset for Semantic Layout Analysis.
CoRR, 2024

CamemBERT 2.0: A Smarter French Language Model Aged to Perfection.
CoRR, 2024

Molyé: A Corpus-based Approach to Language Contact in Colonial France.
CoRR, 2024

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck.
CoRR, 2024

SpiRit-LM: Interleaved Spoken and Written Language Model.
CoRR, 2024

PatentEval: Understanding Errors in Patent Generation.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Headless Language Models: Learning without Predicting with Contrastive Weight Tying.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Tree of Problems: Improving structured problem solving with compositionality.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Anisotropy Is Inherent to Self-Attention in Transformers.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

Mieux comprendre les modèles de langue et les textes qu'ils produisent.
Proceedings of the COnférence en Recherche d'Informations et Applications, 2024

Making Sentence Embeddings Robust to User-Generated Content.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

On the Scaling Laws of Geographical Representation in Language Models.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

From Text to Source: Results in Detecting Large Language Model-Generated Content.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023
Generative Spoken Dialogue Language Modeling.
Trans. Assoc. Comput. Linguistics, 2023

SONAR: Sentence-Level Multimodal and Language-Agnostic Representations.
CoRR, 2023

Is Anisotropy Inherent to Transformers?
CoRR, 2023

A Simple Method for Unsupervised Bilingual Lexicon Induction for Data-Imbalanced, Closely Related Language Pairs.
CoRR, 2023

RoCS-MT: Robustness Challenge Set for Machine Translation.
Proceedings of the Eighth Conference on Machine Translation, 2023

Exploring Data-Centric Strategies for French Patent Classification: A Baseline and Comparisons.
Proceedings of the Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles, TALN 2023 - Volume 1 : travaux de recherche originaux, 2023

Cross-lingual Strategies for Low-resource Language Modeling: A Study on Five Indic Dialects.
Proceedings of the Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles, TALN 2023 - Volume 1 : travaux de recherche originaux, 2023

Towards a Robust Detection of Language Model-Generated Text: Is ChatGPT that easy to detect?
Proceedings of the Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles, TALN 2023 - Volume 1 : travaux de recherche originaux, 2023

Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Neural Agents Struggle to Take Turns in Bidirectional Emergent Communication.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Generative Spoken Language Model based on continuous word-sized audio tokens.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Data-Efficient French Language Modeling with CamemBERTa.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets.
Trans. Assoc. Comput. Linguistics, 2022

DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon.
Trans. Assoc. Comput. Linguistics, 2022

Are Discrete Units Necessary for Spoken Language Modeling?
IEEE J. Sel. Top. Signal Process., 2022

MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling.
CoRR, 2022

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.
CoRR, 2022

MaskEval: Weighted MLM-Based Evaluation for Text Summarization and Simplification.
CoRR, 2022

Inria-ALMAnaCH at WMT 2022: Does Transcription Help Cross-Script Machine Translation?
Proceedings of the Seventh Conference on Machine Translation, 2022

Quand être absent de mBERT n'est que le commencement : Gérer de nouvelles langues à l'aide de modèles de langues multilingues (When Being Unseen from mBERT is just the Beginning : Handling New Languages With Multilingual Language Models).
Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2022

Le projet FREEM : ressources, outils et enjeux pour l'étude du français d'Ancien Régime (The F RE EM project: Resources, tools and challenges for the study of Ancien Régime French).
Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2022

MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

BERTrade: Using Contextual Embeddings to Parse Old French.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France's Court of Cassation Rulings.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Automatic Normalisation of Early Modern French.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Towards a Cleaner Document-Oriented Multilingual Crawled Corpus.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Probing Multilingual Cognate Prediction Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Mapping Urban Air Quality from Mobile Sensors Using Spatio-Temporal Geostatistics.
Sensors, 2021

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP.
CoRR, 2021

Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
CoRR, 2021

Rethinking Automatic Evaluation in Sentence Simplification.
CoRR, 2021

When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Can Character-based Language Models Improve Downstream Task Performances In Low-Resource And Noisy Language Scenarios?
Proceedings of the Seventh Workshop on Noisy User-generated Text, 2021

Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
Multilingual Unsupervised Sentence Simplification.
CoRR, 2020

Can Multilingual Language Models Transfer to an Unseen Dialect? A Case Study on North African Arabizi.
CoRR, 2020

Les modèles de langue contextuels Camembert pour le français : impact de la taille et de l'hétérogénéité des données d'entrainement (C AMEM BERT Contextual Language Models for French: Impact of Training Data Size and Heterogeneity ).
Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020

Establishing a New State-of-the-Art for French Named Entity Recognition.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Controllable Sentence Simplification.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

OFrLex: A Computational Morphological and Syntactic Lexicon for Old French.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Evaluating the Reliability of Acoustic Speech Embeddings.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Building a User-Generated Content North-African Arabizi Treebank: Tackling Hell.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

CamemBERT: a Tasty French Language Model.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Modeling German Verb Argument Structures: LSTMs vs. Humans.
CoRR, 2019

Reference-less Quality Estimation of Text Simplification Systems.
CoRR, 2019

Développement d'un lexique morphologique et syntaxique de l'ancien français (Development of a morphological and syntactic lexicon of Old French).
Proceedings of the Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts, 2019

Enhancing BERT for Lexical Normalization.
Proceedings of the 5th Workshop on Noisy User-generated Text, 2019

What Does BERT Learn about the Structure of Language?
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Cheating a Parser to Death: Data-driven Cross-Treebank Annotation Transfer.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

A multilingual collection of CoNLL-U-compatible morphological lexicons.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

ELMoLex: Connecting ELMo and Lexicon Features for Dependency Parsing.
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Brussels, Belgium, October 31, 2018

Informatiser le lexique - Modélisation, développement et exploitation de lexiques morphologiques, syntaxiques et sémantiques. (Computerising the lexicon - Modelling, development and use of morphological, syntactic and semantic lexicons).
, 2018

2017
Inferring Inflection Classes with Description Length.
J. Lang. Model., 2017

Construction automatique d'une base de données étymologiques à partir du wiktionary (Automatic construction of an etymological database using Wiktionary).
Proceedings of the Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles, 2017

Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin.
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, 2017

Improving neural tagging with lexical information.
Proceedings of the 15th International Conference on Parsing Technologies, 2017

The ParisNLP entry at the ConLL UD Shared Task 2017: A Tale of a #ParsingTragedy.
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 2017

Annotating omission in statement pairs.
Proceedings of the 11th Linguistic Annotation Workshop, 2017

2016
External Lexical Information for Multilingual Part-of-Speech Tagging.
CoRR, 2016

Étiquetage multilingue en parties du discours avec MElt (Multilingual part-of-speech tagging with MElt).
Proceedings of the Actes de la conférence conjointe JEP-TALN-RECITAL 2016. Volume 2 : TALN (Posters), 2016

From Noisy Questions to Minecraft Texts: Annotation Challenges in Extreme Syntax Scenario.
Proceedings of the 2nd Workshop on Noisy User-generated Text, 2016

2015
Constructing a poor man's wordnet in a resource-rich world.
Lang. Resour. Evaluation, 2015

2014
Data-driven synset induction and disambiguation for wordnet development.
Lang. Resour. Evaluation, 2014

The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres.
J. Lang. Technol. Comput. Linguistics, 2014

Named Entity Recognition and Correction in OCRized Corpora (Détection et correction automatique d'entités nommées dans des corpus OCRisés) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014

Sub-categorization in 'pour' and lexical syntax (Sous-catégorisation en pour et syntaxe lexicale) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014

Analogy-based Text Normalization : the case of unknowns words (Normalisation de textes par analogie: le cas des mots inconnus) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014

A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

DeLex, a freely-avaible, large-scale and linguistically grounded morphological lexicon for German.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

An Open-Source Heavily Multilingual Translation Graph Extracted from Wiktionaries and Parallel Corpora.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Developing a French FrameNet: Methodology and First results.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

A Language-independent Approach to Extracting Derivational Relations from an Inflectional Lexicon.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Automated Error Detection in Digitized Cultural Heritage Documents.
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, 2014

2013
Dynamic extension of a French morphological lexicon based a text stream (Extension dynamique de lexiques morphologiques pour le français à partir d'un flux textuel) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2013

Implementing a Formal Model of Inflectional Morphology.
Proceedings of the Systems and Frameworks for Computational Morphology, 2013

Enforcing Subcategorization Constraints in a Parser Using Sub-parses Recombining.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Can MDL Improve Unsupervised Chinese Word Segmentation?
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, 2013

2012
Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging.
Lang. Resour. Evaluation, 2012

Annotation référentielle du Corpus Arboré de Paris 7 en entités nommées (Referential named entity annotation of the Paris 7 French TreeBank) [in French].
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, 2012

TCOF-POS : un corpus libre de français parlé annoté en morphosyntaxe (TCOF-POS : A Freely Available POS-Tagged Corpus of Spoken French) [in French].
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, 2012

Population of a Knowledge Base for News Metadata from Unstructured Text and Web Data.
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, 2012

Evaluating and improving syntactic lexica by plugging them within a parser.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Aleda, a free large-scale entity database for French.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Cleaning noisy wordnets.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Wordnet extension made simple: A multilingual lexicon-based approach using wiki resources.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Boosting the Coverage of a Semantic Lexicon by Automatically Extracted Event Nominalizations.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Applying cross-lingual WSD to wordnet development.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

The French Social Media Bank: a Treebank of Noisy User Generated Content.
Proceedings of the COLING 2012, 2012

Unsupervized Word Segmentation: the Case for Mandarin Chinese.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

Statistical Parsing of Spanish and Data Driven Lemmatization.
Proceedings of the Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, 2012

2011
Modeling and implementing non canonical morphological phenomena.
Trait. Autom. des Langues, 2011

Évaluation de lexiques syntaxiques par leur intégartion dans l'analyseur syntaxiques FRMG
CoRR, 2011

Construction d'un lexique des adjectifs dénominaux (Construction of a lexicon of denominal adjectives).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2011

Développement de ressources pour le persan : PerLex 2, nouveau lexique morphologique et MEltfa, étiqueteur morphosyntaxique (Development of resources for Persian: PerLex 2, a new morphological lexicon and MEltfa, a morphosyntactic tagger).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2011

Un turc mécanique pour les ressources linguistiques : critique de la myriadisation du travail parcellisé (Mechanical Turk for linguistic resources: review of the crowdsourcing of parceled work).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2011

Segmentation et induction de lexique non-supervisées du mandarin (Unsupervised segmentation and induction of mandarin lexicon).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2011

Coopération de méthodes statistiques et symboliques pour l'adaptation non-supervisée d'un système d'étiquetage en entités nommées (Statistical and symbolic methods cooperation for the unsupervised adaptation of a named entity recognition system).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2011

Non-canonical Inflection: Data, Formalisation and Complexity Measures.
Proceedings of the Systems and Frameworks for Computational Morphology, 2011

Classification-Based Extension of Wordnets from Heterogeneous Resources.
Proceedings of the Human Language Technology Challenges for Computer Science and Linguistics, 2011

Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use.
Proceedings of the Human Language Technology Challenges for Computer Science and Linguistics, 2011

Data Driven Lemmatization and Parsing of Italian.
Proceedings of the Evaluation of Natural Language and Speech Tools for Italian, 2011

2010
Détection et résolution d'entités nommées dans des dépêches d'agence.
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2010

Développement de ressources pour le persan: lexique morphologique et chaîne de traitements de surface.
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2010

Exploitation d'une ressource lexicale pour la construction d'un étiqueteur morpho-syntaxique état-de-l'art du français.
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2010

Ponctuations fortes abusives.
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2010

Traitement des inconnus : une approche systématique de l'incomplétude lexicale.
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2010

Control Verb, Argument Cluster Coordination and Multi Component TAG.
Proceedings of the 10th International Workshop on Tree Adjoining Grammar and Related Frameworks, 2010

A Morphological Lexicon for the Persian Language.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

A Lexicon of French Quotation Verbs for Automatic Quotation Extraction.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

The Lefff, a Freely Available and Large-coverage Morphological and Syntactic Lexicon for French.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

Influence of Pre-Annotation on POS-Tagged Corpus Development.
Proceedings of the Fourth Linguistic Annotation Workshop, 2010

Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two.
Proceedings of the ACL 2010, 2010

Are Very Large Context-Free Grammars Tractable?
Proceedings of the Trends in Parsing Technology, 2010

2009
Producción eficiente de recursos lingüísticos: proyecto Victoria.
Proces. del Leng. Natural, 2009

Construcción y extensión de un léxico morfológico y sintáctico para el español: el Leffe.
Proces. del Leng. Natural, 2009

Intégrer les tables du Lexique-Grammaire à un analyseur syntaxique robuste à grande échelle.
Proceedings of the Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2009

Trouver et confondre les coupables : un processus sophistiqué de correction de lexique.
Proceedings of the Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2009

Towards Efficient Production of Linguistic Resources: the Victoria Project.
Proceedings of the Recent Advances in Natural Language Processing, 2009

A Morphological and Syntactic Wide-coverage Lexicon for Spanish: The Leffe.
Proceedings of the Recent Advances in Natural Language Processing, 2009

Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort.
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009

Building a morphological and syntactic lexicon by merging various linguistic resources.
Proceedings of the 17th Nordic Conference of Computational Linguistics, 2009

MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars (Application Note).
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009

Using Lexicon-Grammar Tables for French Verbs in a Large-Coverage Parser.
Proceedings of the Human Language Technology. Challenges for Computer Science and Linguistics, 2009

Extracting and Visualizing Quotations from News Wires.
Proceedings of the Human Language Technology. Challenges for Computer Science and Linguistics, 2009

Parsing Directed Acyclic Graphs with Range Concatenation Grammars.
Proceedings of the 11th International Workshop on Parsing Technologies (IWPT-2009), 2009

Constructing parse forests that include exactly the n-best PCFG trees.
Proceedings of the 11th International Workshop on Parsing Technologies (IWPT-2009), 2009

Multi-Component Tree Insertion Grammars.
Proceedings of the Formal Grammar - 14th International Conference, 2009

2008
Error Mining on Syntactic Parser Output.
Trait. Autom. des Langues, 2008

S XPipe 2: an architecture for surface preprocessing of raw corpora.
Trait. Autom. des Langues, 2008

Extensión y corrección semi-automática de léxicos morfo-sintácticos.
Proces. del Leng. Natural, 2008

Combining Multiple Resources to Build Reliable Wordnets.
Proceedings of the Text, Speech and Dialogue, 11th International Conference, 2008

Construction d'un wordnet libre du français à partir de ressources multilingues.
Proceedings of the Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2008

Computer Aided Correction and Extension of a Syntactic Wide-Coverage Lexicon.
Proceedings of the COLING 2008, 2008

2007
Comparaison du Lexique-Grammaire des verbes pleins et de DICOVALENCE : vers une intégration dans le Lefff.
Proceedings of the Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2007

Building a Morphosyntactic Lexicon and a Pre-syntactic Processing Chain for Polish.
Proceedings of the Human Language Technology. Challenges of the Information Society, 2007

Mining Parsing Results for Lexical Correction: Toward a Complete Correction Process of Wide-Coverage Lexicons.
Proceedings of the Human Language Technology. Challenges of the Information Society, 2007

Are Very Large Context-Free Grammars Tractable?
Proceedings of the Tenth International Conference on Parsing Technologies, 2007

2006
Modélisation et analyse des coordinations elliptiques par l'exploitation dynamique des forêts de dérivation.
Proceedings of the Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Posters, 2006

Trouver le coupable : Fouille d'erreurs sur des sorties d'analyseurs syntaxiques.
Proceedings of the Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2006

Modeling and Analysis of Elliptic Coordination by Dynamic Exploitation of Derivation Forests in LTAG Parsing.
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms, 2006

The Lefff 2 syntactic lexicon for French: architecture, acquisition, use.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Deep non-probabilistic parsing of large corpora.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Error Mining in Parsing Results.
Proceedings of the ACL 2006, 2006

2005
Automatic Acquisition of a Slovak Lexicon from a Raw Corpus.
Proceedings of the Text, Speech and Dialogue, 8th International Conference, 2005

Les Méta-RCG: description et mise en oeuvre.
Proceedings of the Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2005

Un analyseur LFG efficace pour le français : SXLFG.
Proceedings of the Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2005

Chaînes de traitement syntaxique.
Proceedings of the Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2005

Linguistic Facts as Predicates over Ranges of the Sentence.
Proceedings of the Logical Aspects of Computational Linguistics, 2005

Efficient and Robust LFG Parsing: SxLFG.
Proceedings of the Ninth International Workshop on Parsing Technology, 2005

2004
Coupling Grammar and Knowledge Base: Range Concatenation Grammars and Description Logics.
Proceedings of the Text, Speech and Dialogue, 7th International Conference, 2004

Les Grammaires à Concaténation d'Intervalles (RCG) comme formalisme grammatical pour la linguistique.
Proceedings of the Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2004

Morphology Based Automatic Acquisition of Large-coverage Lexica.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004


  Loading...