Marzieh Fadaee

Orcid: 0000-0002-4447-1213

According to our database1, Marzieh Fadaee authored at least 45 papers between 2013 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
NeoBabel: A Multilingual Open Tower for Visual Generation.
CoRR, July, 2025

One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers.
CoRR, June, 2025

The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It.
CoRR, May, 2025

The Multilingual Divide and Its Impact on Global AI Safety.
CoRR, May, 2025

Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects.
CoRR, May, 2025

Aya Vision: Advancing the Frontier of Multilingual Multimodality.
CoRR, May, 2025

The Leaderboard Illusion.
CoRR, April, 2025

A Post-trainer's Guide to Multilingual Training Data: Uncovering Cross-lingual Transfer Dynamics.
CoRR, April, 2025

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation.
CoRR, April, 2025

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation.
CoRR, April, 2025

Command A: An Enterprise-Ready Large Language Model.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, April, 2025

Towards Best Practices for Open Datasets for LLM Training.
CoRR, January, 2025


To Code or Not To Code? Exploring Impact of Code in Pre-training.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

M-RewardBench: Evaluating Reward Models in Multilingual Settings.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier.
CoRR, 2024

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation.
CoRR, 2024

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning.
CoRR, 2024

Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement.
CoRR, 2024

LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives.
CoRR, 2024

Aya 23: Open Weight Releases to Further Multilingual Progress.
CoRR, 2024

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning.
CoRR, 2024

Elo Uncovered: Robustness and Best Practices in Language Model Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024


Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation.
CoRR, 2023

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale.
CoRR, 2023

InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval.
CoRR, 2023

2022
In Defense of Cross-Encoders for Zero-Shot Retrieval.
CoRR, 2022

No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval.
CoRR, 2022

InPars: Data Augmentation for Information Retrieval using Large Language Models.
CoRR, 2022

InPars: Unsupervised Dataset Generation for Information Retrieval.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

2021
Understanding and Enhancing the Use of Context for Machine Translation.
CoRR, 2021

2020
A New Neural Search and Insights Platform for Navigating and Organizing AI Research.
Proceedings of the First Workshop on Scholarly Document Processing, 2020

The Unreasonable Volatility of Neural Machine Translation Models.
Proceedings of the Fourth Workshop on Neural Generation and Translation, 2020

2018
Examining the Tip of the Iceberg: A Data Set for Idiom Translation.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

2017
Data Augmentation for Low-Resource Neural Machine Translation.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Learning Topic-Sensitive Word Representations.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2013
Automatic WordNet Construction Using Markov Chain Monte Carlo.
Polibits, 2013


  Loading...