Aitor Soroa

Orcid: 0000-0001-8573-2654

According to our database1, Aitor Soroa authored at least 90 papers between 1996 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Latxa: An Open Language Model and Evaluation Suite for Basque.
CoRR, 2024

Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset.
CoRR, 2024

2023

Image captioning for effective use of language models in knowledge-based visual question answering.
Expert Syst. Appl., 2023

Do Multilingual Language Models Think Better in English?
CoRR, 2023

Scaling Laws for BERT in Low-Resource Settings.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.
CoRR, 2022

Noisy Channel for Automatic Text Simplification.
CoRR, 2022

Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources.
CoRR, 2022


BasqueGLUE: A Natural Language Understanding Benchmark for Basque.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

KIDE4Assistant: an Ontology-Driven Dialogue System Adaptation for Assistance in Maintenance Procedures.
Proceedings of the 12th International Workshop on Formal Ontologies meet Industry (FOMI 2022) Co-located with workshops about the Industrial Ontology Foundry (IOF) and the European project OntoCommons (EU H2020 project), 2022

PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Does Corpus Quality Really Matter for Low-Resource Languages?
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Principled Paraphrase Generation with Parallel Corpora.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Inferring spatial relations from textual descriptions of images.
Pattern Recognit., 2021

Towards zero-shot cross-lingual named entity disambiguation.
Expert Syst. Appl., 2021

Linguistic Capabilities for a Checklist-based evaluation in Automatic Text Simplification.
Proceedings of the First Workshop on Current Trends in Text Simplification (CTTS 2021) co-located with the 37th Conference of the Spanish Society for Natural Language Processing (SEPLN2021), 2021

A Syntax-Aware Edit-based System for Text Simplification.
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021

Ontology Population Reusing Resources for Dialogue Intent Detection: Generic and Multilingual Approach.
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021

TODO: A Core Ontology for Task-Oriented Dialogue Systems in Industry 4.0.
Proceedings of the Further with Knowledge Graphs, 2021

Beyond Offline Mapping: Learning Cross-lingual Word Embeddings through Context Anchoring.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Give your Text Representation Models some Love: the Case for Basque.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature.
Proceedings of the 1st Workshop on NLP for COVID-19@ EMNLP 2020, Online, December 2020, 2020

Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Evaluating Multimodal Representations on Visual Semantic Textual Similarity.
Proceedings of the ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020, 2020

Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

DoQA - Accessing Domain-Specific FAQs via Conversational QA.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Analyzing the Limitations of Cross-lingual Word Embedding Mappings.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Bilingual embeddings with random walks over multilingual wordnets.
Knowl. Based Syst., 2018

Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset.
CoRR, 2018

The risk of sub-optimal use of Open Source NLP Software: UKB is inadvertently state-of-the-art in knowledge-based WSD.
CoRR, 2018

Learning Text Representations for 500K Classification Tasks on Named Entity Disambiguation.
Proceedings of the 22nd Conference on Computational Natural Language Learning, 2018

2017
A scalable architecture for data-intensive natural language processing.
Nat. Lang. Eng., 2017

2016
Building event-centric knowledge graphs from news.
J. Web Semant., 2016

Two Architectures for Parallel Processing of Huge Amounts of Text.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Interoperability of Annotation Schemes: Using the Pepper Framework to Display AWA Documents in the ANNIS Interface.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Exploring Comparative Evaluation of Semantic Enrichment Tools for Cultural Heritage Metadata.
Proceedings of the Research and Advanced Technology for Digital Libraries, 2016

Alleviating Poor Context with Background Knowledge for Named Entity Disambiguation.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

Single or Multiple? Combining Word Representations Independently Learned from Text and WordNet.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Big data for Natural Language Processing: A streaming approach.
Knowl. Based Syst., 2015

Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation.
CoRR, 2015

Combining Mention Context and Hyperlinks from Wikipedia for Named Entity Disambiguation.
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, 2015

Random Walks and Neural Network Language Models on Knowledge Bases.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Improving distant supervision using inference learning.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

2014
NewsReader project.
Proces. del Leng. Natural, 2014

Improving search over Electronic Health Records using UMLS-based query expansion through random walks.
J. Biomed. Informatics, 2014

Evaluating hierarchical organisation structures for exploring digital libraries.
Inf. Retr., 2014

Random Walks for Knowledge-Based Word Sense Disambiguation.
Comput. Linguistics, 2014

UBC entity recognition and disambiguation at ERD 2014.
Proceedings of the ERD'14, 2014

A stream computing approach towards scalable NLP.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

"One Entity per Discourse" and "One Entity per Collocation" Improve Named-Entity Disambiguation.
Proceedings of the COLING 2014, 2014

Exploring the use of word embeddings and random walks on Wikipedia for the CogAlex shared task.
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon, 2014

2013
KYOTO: A Knowledge-Rich Approach to the Interoperable Mining of Events from Text.
Proceedings of the New Trends of Research in Ontologies and Lexical Resources, 2013

UBC Entity Linking at TAC-KBP 2013: random forests for high accuracy.
Proceedings of the Sixth Text Analysis Conference, 2013

Information seeking in digital cultural heritage with PATHS.
Proceedings of the 36th International ACM SIGIR conference on research and development in Information Retrieval, 2013

PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment.
Proceedings of the Research and Advanced Technology for Digital Libraries, 2013

PATHS: A System for Accessing Cultural Heritage Collections.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
Exploiting domain information for Word Sense Disambiguation of medical documents.
J. Am. Medical Informatics Assoc., 2012

UKP-UBC Entity Linking at TAC-KBP.
Proceedings of the Fifth Text Analysis Conference, 2012

Matching Cultural Heritage items to Wikipedia.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Enabling the Discovery of Digital Cultural Heritage Objects through Wikipedia.
Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, 2012

PATHS - Exploring Digital Cultural Heritage Spaces.
Proceedings of the Theory and Practice of Digital Libraries, 2012

Comparing Taxonomies for Organising Collections of Documents.
Proceedings of the COLING 2012, 2012

2011
Two birds with one stone: learning semantic models for text categorization and word sense disambiguation.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

2010
Graph-based Word Sense Disambiguation of biomedical documents.
Bioinform., 2010

Kyoto: An Integrated System for Specific Domain WSD.
Proceedings of the 5th International Workshop on Semantic Evaluation, 2010

Exploring Knowledge Bases for Similarity.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

2009
Dealing With Complex Linguistic Annotations Within a Language Processing Framework.
IEEE Trans. Speech Audio Process., 2009

KYOTO Project.
Proces. del Leng. Natural, 2009

WikiWalk: Random walks on Wikipedia for Semantic Relatedness.
Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, 2009

A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009

Knowledge-Based WSD and Specific Domains: Performing Better than Generic Supervised WSD.
Proceedings of the IJCAI 2009, 2009

Personalizing PageRank for Word Sense Disambiguation.
Proceedings of the EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30, 2009

2008
ELHISA: An architecture for the integration of heterogeneous lexical information.
Nat. Lang. Eng., 2008

Spelling Correction: from Two-Level Morphology to Open Source.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

2007
Specification of a General Linguistic Annotation Framework and its Use in a Real Context.
Proces. del Leng. Natural, 2007

UBC-AS: A Graph Based Unsupervised System for Induction and Classification.
Proceedings of the 4th International Workshop on Semantic Evaluations, 2007

SemEval-2007 Task 02: Evaluating Word Sense Induction and Discrimination Systems.
Proceedings of the 4th International Workshop on Semantic Evaluations, 2007

2006
Structure, Annotation and Tools in the Basque ZT Corpus.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Two graph-based algorithms for state-of-the-art WSD.
Proceedings of the EMNLP 2006, 2006

2005
Una arquitectura de integración de recursos léxicos de naturaleza heterogénea. Una aportación desde la perspectiva de la integración de datos.
Proces. del Leng. Natural, 2005

2002
A Class Library for the Integration of NLP Tools: Definition and implementation of an Abstract Data Type Collection for the manipulation of SGML documents in a context of stand-off linguistic annotation.
Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

2000
A Methodology for Building Translator-oriented Dictionary Systems.
Mach. Transl., 2000

A Proposal for the Integration of NLP Tools using SGML-Tagged Documents.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

1999
MLDS: A translator-oriented MultiLingual dictionary system.
Nat. Lang. Eng., 1999

1996
Constructing an intelligent dictionary help system.
Nat. Lang. Eng., 1996


  Loading...