Orevaoghene Ahia

According to our database1, Orevaoghene Ahia authored at least 35 papers between 2019 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
IDIOLEX: Unified and Continuous Representations for Idiolectal and Stylistic Variation.
CoRR, April, 2026

Frame-Level Internal Tool Use for Temporal Grounding in Audio LMs.
CoRR, February, 2026

BASS: Benchmarking Audio LMs for Musical Structure and Semantic Reasoning.
CoRR, February, 2026

2025
Cognitive Foundations for Reasoning and Their Manifestation in LLMs.
CoRR, November, 2025

FLEXITOKENS: Flexible Tokenization for Evolving Language Models.
CoRR, July, 2025

Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations.
CoRR, June, 2025

BLAB: Brutally Long Audio Bench.
CoRR, May, 2025

2024
MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization.
CoRR, 2024

DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages.
CoRR, 2024

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

Teaching LLMs to Abstain across Languages via Multilingual Feedback.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

DIALECTBENCH: An NLP Benchmark for Dialects, Varieties, and Closely-Related Languages.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages.
CoRR, 2023

LEXPLAIN: Improving Model Explanations via Lexicon Supervision.
Proceedings of the The 12th Joint Conference on Lexical and Computational Semantics, 2023


Better Quality Pre-training Data and T5 Models for African Languages.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023


That was the last straw, we need more: Are Translation Systems Sensitive to Disambiguating Context?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets.
Trans. Assoc. Comput. Linguistics, 2022

Ìtàkúròso: Exploiting Cross-Lingual Transferability for Natural Language Generation of Dialogues in Low-Resource, African Languages.
CoRR, 2022

What a Creole Wants, What a Creole Needs.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Intriguing Properties of Compression on Multilingual Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022


2021
MasakhaNER: Named Entity Recognition for African Languages.
Trans. Assoc. Comput. Linguistics, 2021

The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

2020
Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages.
CoRR, 2020

Towards Supervised and Unsupervised Neural Machine Translation Baselines for Nigerian Pidgin.
Proceedings of the 1st AfricaNLP Workshop Proceedings, 2020



2019
PidginUNMT: Unsupervised Neural Machine Translation from West African Pidgin to English.
CoRR, 2019


  Loading...