We stand with Ukraine

We stand with Ukraine

Marzieh Fadaee

Orcid: 0000-0002-4447-1213

According to our database¹, Marzieh Fadaee authored at least 48 papers between 2013 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

The Art of Asking: Multilingual Prompt Optimization for Synthetic Data.

[BibT_eX]

[DOI]

,

Viraat Aryabumi

,

,

,

,

CoRR, October, 2025

Making, not Taking, the Best of N.

[BibT_eX]

[DOI]

,

,

,

CoRR, October, 2025

Verification Limits Code LLM Training.

[BibT_eX]

[DOI]

,

Elena Tommasone

,

,

,

Matthias Gallé

,

CoRR, September, 2025

NeoBabel: A Multilingual Open Tower for Visual Generation.

[BibT_eX]

[DOI]

Mohammad Mahdi Derakhshani

,

Dheeraj Varghese

,

,

Cees G. M. Snoek

CoRR, July, 2025

One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers.

[BibT_eX]

[DOI]

,

Alejandro Salamanca

,

Felipe Cruz-Salinas

,

,

,

,

,

,

CoRR, June, 2025

The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It.

[BibT_eX]

[DOI]

,

,

,

Stephen H. Bach

,

CoRR, May, 2025

The Multilingual Divide and Its Impact on Global AI Safety.

[BibT_eX]

[DOI]

,

,

Alice Schoenauer Sebag

,

Kelly Marchisio

,

,

,

Samuel Cahyawijaya

,

Shivalika Singh

,

Seraphina Goldfarb-Tarrant

,

Viraat Aryabumi

,

,

,

,

Matthias Gallé

,

,

CoRR, May, 2025

Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects.

[BibT_eX]

[DOI]

CoRR, May, 2025

Aya Vision: Advancing the Frontier of Multilingual Multimodality.

[BibT_eX]

[DOI]

CoRR, May, 2025

The Leaderboard Illusion.

[BibT_eX]

[DOI]

Shivalika Singh

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

A Post-trainer's Guide to Multilingual Training Data: Uncovering Cross-lingual Transfer Dynamics.

[BibT_eX]

[DOI]

Luísa Shimabucoro

,

,

,

Sebastian Ruder

CoRR, April, 2025

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation.

[BibT_eX]

[DOI]

,

Eleftheria Briakou

,

,

,

CoRR, April, 2025

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation.

[BibT_eX]

[DOI]

Israfel Salazar

,

Manuel Fernández Burda

,

Shayekh Bin Islam

,

Arshia Soltani Moakhar

,

Shivalika Singh

,

Fabian Farestam

,

Angelika Romanou

,

,

,

,

Dominik Krzeminski

,

Jekaterina Novikova

,

Luísa Shimabucoro

,

Joseph Marvin Imperial

,

Rishabh Maheshwary

,

,

Alfonso Amayuelas

,

,

,

,

Nicholas Popovic

,

,

Azmine Toushik Wasi

,

Ram Mohan Rao Kadiyala

,

,

Maksim Kostritsya

,

Bardia Soltani Moakhar

,

Gabriel da Costa Merlin

,

Otávio Ferracioli Coletti

,

Maral Jabbarishiviari

,

MohammadAmin farahani fard

,

Silvia Fernandez

,

María Grandury

,

Dmitry Abulkhanov

,

,

André Guarnier de Mitri

,

Leticia Bossatto Marchezi

,

Setayesh Heydari

,

Johan S. Obando-Ceron

,

,

,

Desmond Elliott

,

,

,

CoRR, April, 2025

Command A: An Enterprise-Ready Large Language Model.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Sophia Althammer

,

Arkady Arkhangorodsky

,

Viraat Aryabumi

,

Dennis Aumiller

,

Raphaël Avalos

,

,

,

,

Alexandre Barbet

,

,

Björn Bebensee

,

,

Walter Beller-Morales

,

Alexandre Bérard

,

Andrew Berneshawi

,

,

,

,

,

,

,

Samuel Cahyawijaya

,

,

Jon Ander Campos

,

,

,

Roman Castagné

,

Julián Cendrero

,

Leila Chan Currie

,

,

,

Giannis Chatziveroglou

,

,

,

Alexis Chevalier

,

,

,

,

,

,

,

,

,

Lucas Crawhall-Stein

,

,

Felipe Cruz-Salinas

,

,

,

Hugo Dalla-Torre

,

,

William Darling

,

Omar Darwiche Domingues

,

,

Antoine Debugne

,

,

,

,

Rishit Dholakia

,

,

,

,

Abdullah Elkady

,

Sarah Elsharkawy

,

,

,

,

,

,

Yannis Flet-Berliac

,

,

Matthias Gallé

,

Wojciech Galuba

,

,

,

Mohammad Gheshlaghi Azar

,

Ellen Gilsenan-McMahon

,

Seraphina Goldfarb-Tarrant

,

,

,

Victor Machado Gonzaga

,

Nithya Govindarajan

,

Manoj Govindassamy

,

Nathan Grinsztajn

,

Nikolas Gritsch

,

,

,

,

,

,

,

Sebastian Hofstätter

,

CoRR, April, 2025

Towards Best Practices for Open Datasets for LLM Training.

[BibT_eX]

[DOI]

CoRR, January, 2025

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge.

[BibT_eX]

[DOI]

Angelika Romanou

,

,

,

,

Sree Harsha Nelaturu

,

Shivalika Singh

,

Rishabh Maheshwary

,

,

Mohamed A. Haggag

,

,

,

,

Antoine Bosselut

,

,

Alfonso Amayuelas

,

Azril Hafizi Amirudin

,

Viraat Aryabumi

,

,

,

,

,

Aditya Kumar Dalmia

,

,

,

Daniil Dzenhaliou

,

Daniel Fernando Erazo Florez

,

Fabian Farestam

,

Joseph Marvin Imperial

,

Shayekh Bin Islam

,

,

Maral Jabbarishiviari

,

Börje F. Karlsson

,

,

Christopher Klamm

,

,

Dominik Krzeminski

,

Gabriel Adriano de Melo

,

Syrielle Montariol

,

,

,

Jekaterina Novikova

,

Johan Samir Obando-Ceron

,

,

,

,

,

Selvan Sunitha Ravi

,

,

Roshan Santhosh

,

,

Marjana Prifti Skenduli

,

Arshia Soltani Moakhar

,

Bardia Soltani Moakhar

,

,

Ayush Kumar Tarun

,

Azmine Toushik Wasi

,

Thenuka Ovin Weerasinghe

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

To Code or Not To Code? Exploring Impact of Code in Pre-training.

[BibT_eX]

[DOI]

Viraat Aryabumi

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions.

[BibT_eX]

[DOI]

Nathanaël Carraz Rakotonirina

,

,

Jon Ander Campos

,

,

Alberto Testoni

,

,

Sandro Pezzelle

,

Marco Del Tredici

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

M-RewardBench: Evaluating Reward Models in Multilingual Settings.

[BibT_eX]

[DOI]

,

Lester James Validad Miranda

,

Shayekh Bin Islam

,

Rishabh Maheshwary

,

,

Gusti Triandi Winata

,

,

Sebastian Ruder

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier.

[BibT_eX]

[DOI]

CoRR, 2024

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation.

[BibT_eX]

[DOI]

CoRR, 2024

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning.

[BibT_eX]

[DOI]

,

,

Seraphina Goldfarb-Tarrant

,

,

,

CoRR, 2024

Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives.

[BibT_eX]

[DOI]

Luísa Shimabucoro

,

Sebastian Ruder

,

,

,

CoRR, 2024

Aya 23: Open Weight Releases to Further Multilingual Progress.

[BibT_eX]

[DOI]

CoRR, 2024

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Elo Uncovered: Robustness and Best Practices in Language Model Evaluation.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives.

[BibT_eX]

[DOI]

Luísa Shimabucoro

,

Sebastian Ruder

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm.

[BibT_eX]

[DOI]

,

,

,

Seraphina Goldfarb-Tarrant

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model.

[BibT_eX]

[DOI]

,

Viraat Aryabumi

,

,

,

,

Gbemileke Onilude

,

,

Shivalika Singh

,

,

,

,

,

,

Niklas Muennighoff

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs.

[BibT_eX]

[DOI]

,

,

Matthias Gallé

,

,

,

Olivier Pietquin

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2023

InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval.

[BibT_eX]

[DOI]

,

Luiz Henrique Bonifacio

,

Hugo Queiroz Abonizio

,

,

Roberto A. Lotufo

,

,

Rodrigo Nogueira

CoRR, 2023

2022

In Defense of Cross-Encoders for Zero-Shot Retrieval.

[BibT_eX]

[DOI]

,

Luiz Henrique Bonifacio

,

,

Hugo Queiroz Abonizio

,

,

Roberto A. Lotufo

,

Rodrigo Nogueira

CoRR, 2022

No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval.

[BibT_eX]

[DOI]

Guilherme Moraes Rosa

,

Luiz Henrique Bonifacio

,

,

Hugo Queiroz Abonizio

,

,

Roberto A. Lotufo

,

Rodrigo Nogueira

CoRR, 2022

InPars: Data Augmentation for Information Retrieval using Large Language Models.

[BibT_eX]

[DOI]

Luiz Henrique Bonifacio

,

Hugo Queiroz Abonizio

,

,

Rodrigo Nogueira

CoRR, 2022

InPars: Unsupervised Dataset Generation for Information Retrieval.

[BibT_eX]

[DOI]

Luiz Henrique Bonifacio

,

Hugo Queiroz Abonizio

,

,

Rodrigo Nogueira

Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

2021

Understanding and Enhancing the Use of Context for Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2021

2020

A New Neural Search and Insights Platform for Navigating and Organizing AI Research.

[BibT_eX]

[DOI]

,

Olga Gureenkova

,

Fernando Rejon Barrera

,

Carsten Schnober

,

Wouter Weerkamp

,

Proceedings of the First Workshop on Scholarly Document Processing, 2020

The Unreasonable Volatility of Neural Machine Translation Models.

[BibT_eX]

[DOI]

,

Proceedings of the Fourth Workshop on Neural Generation and Translation, 2020

2018

Examining the Tip of the Iceberg: A Data Set for Idiom Translation.

[BibT_eX]

[DOI]

,

Arianna Bisazza

,

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation.

[BibT_eX]

[DOI]

,

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

2017

Data Augmentation for Low-Resource Neural Machine Translation.

[BibT_eX]

[DOI]

,

Arianna Bisazza

,

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Learning Topic-Sensitive Word Representations.

[BibT_eX]

[DOI]

,

Arianna Bisazza

,

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2013

Automatic WordNet Construction Using Markov Chain Monte Carlo.

[BibT_eX]

[DOI]

,

Hamidreza Ghader

,

,

Polibits, 2013

Loading...