Adrien Barbaresi

Orcid: 0000-0002-8079-8694

According to our database1, Adrien Barbaresi authored at least 25 papers between 2011 and 2021.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2021
Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction.
Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
htmldate: A Python package to extract publication dates from web pages.
J. Open Source Softw., 2020

Bien choisir son outil d'extraction de contenu à partir du Web (Choosing the appropriate tool for Web Content Extraction ).
Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020

Que recèlent les données textuelles issues du web ? (What do text data from the Web have to hide ?).
Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020

Out-of-the-Box and into the Ditch? Multilingual Evaluation of Generic Text Extraction Tools.
Proceedings of the 12th Web as Corpus Workshop, 2020

2019
Generic Web Content Extraction with Open-Source Software.
Proceedings of the 15th Conference on Natural Language Processing, 2019

2018
Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

A database of German definitory contexts from selected web sources.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

A corpus of German political speeches from the 21st century.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Borderlands of text mapping: Experiments on Fontane's Brandenburg.
Proceedings of the GI-Workshop: Im Spannungsfeld zwischen Tool-Building und Forschung auf Augenhöhe, 2018

2017
Discriminating between Similar Languages using Weighted Subword Features.
Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017

Data-Driven Identification of German Phrasal Compounds.
Proceedings of the Text, Speech, and Dialogue - 20th International Conference, 2017

Towards a toolbox to map historical text collections.
Proceedings of the 11th Workshop on Geographic Information Retrieval, 2017

Toponyms as Entry Points into a Digital Edition: Mapping The Torch (1899-1936).
Proceedings of the 12th Annual International Conference of the Alliance of Digital Humanities Organizations, 2017

2016
An Unsupervised Morphological Criterion for Discriminating Similar Languages.
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

Bootstrapped OCR error detection for a less-resourced language variant.
Proceedings of the 13th Conference on Natural Language Processing, 2016

APIs in Digital Humanities: The Infrastructural Turn.
Proceedings of the 11th Annual International Conference of the Alliance of Digital Humanities Organizations, 2016

Extraction and Visualization of Toponyms in Diachronic Text Corpora.
Proceedings of the 11th Annual International Conference of the Alliance of Digital Humanities Organizations, 2016

Visualisierung von Ortsnamen im Deutschen Textarchiv.
Proceedings of the 3. Tagung des Verbands Digital Humanities im deutschsprachigen Raum, 2016

Efficient construction of metadata-enhanced web corpora.
Proceedings of the 10th Web as Corpus Workshop, 2016

2015
Ad hoc and general-purpose corpus construction from web sources. (Construction de corpus généraux et spécialisés à partir du Web).
PhD thesis, 2015

2014
Focused Web Corpus Crawling.
Proceedings of the 9th Web as Corpus Workshop, 2014

Finding Viable Seed URLs for Web Corpora: A Scouting Approach and Comparative Study of Available Sources.
Proceedings of the 9th Web as Corpus Workshop, 2014

2013
Crawling microblogging services to gather language-classified URLs. Workflow and case study.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2011
La complexité linguistique Méthode d'analyse.
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues (articles courts), 2011


  Loading...