Orevaoghene Ahia

According to our database1, Orevaoghene Ahia authored at least 31 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
FLEXITOKENS: Flexible Tokenization for Evolving Language Models.
CoRR, July, 2025

Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations.
CoRR, June, 2025

BLAB: Brutally Long Audio Bench.
CoRR, May, 2025

2024
MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization.
CoRR, 2024

DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages.
CoRR, 2024

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

Teaching LLMs to Abstain across Languages via Multilingual Feedback.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

DIALECTBENCH: An NLP Benchmark for Dialects, Varieties, and Closely-Related Languages.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages.
CoRR, 2023

LEXPLAIN: Improving Model Explanations via Lexicon Supervision.
Proceedings of the The 12th Joint Conference on Lexical and Computational Semantics, 2023


Better Quality Pre-training Data and T5 Models for African Languages.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023


That was the last straw, we need more: Are Translation Systems Sensitive to Disambiguating Context?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets.
Trans. Assoc. Comput. Linguistics, 2022

Ìtàkúròso: Exploiting Cross-Lingual Transferability for Natural Language Generation of Dialogues in Low-Resource, African Languages.
CoRR, 2022

What a Creole Wants, What a Creole Needs.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Intriguing Properties of Compression on Multilingual Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022


2021
MasakhaNER: Named Entity Recognition for African Languages.
Trans. Assoc. Comput. Linguistics, 2021

The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

2020
Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages.
CoRR, 2020

Towards Supervised and Unsupervised Neural Machine Translation Baselines for Nigerian Pidgin.
Proceedings of the 1st AfricaNLP Workshop Proceedings, 2020



2019
PidginUNMT: Unsupervised Neural Machine Translation from West African Pidgin to English.
CoRR, 2019


  Loading...