We stand with Ukraine

We stand with Ukraine

Benjamin Muller

Orcid: 0000-0002-3894-7887

Affiliations:

Meta, NYC, USA

According to our database¹, Benjamin Muller authored at least 25 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on benjamin-mlr.github.io
on scholar.google.com

On csauthors.net:

Bibliography

2025

Latent Speech-Text Transformer.

[DOI]

,

,

,

Benjamin Muller

,

Jesús Villalba

,

,

Luke Zettlemoyer

,

,

,

Srinivasan Iyer

,

CoRR, October, 2025

The appification of borders: Data, migration and digitalization.

[DOI]

,

Philippe M. Frowd

,

Benjamin Muller

Big Data Soc., 2025

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models.

[DOI]

Lucas Bandarkar

,

Benjamin Muller

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Byte Latent Transformer: Patches Scale Better Than Tokens.

[DOI]

Artidoro Pagnoni

,

Ramakanth Pasunuru

,

Pedro Rodríguez

,

,

Benjamin Muller

,

,

,

,

Jason E. Weston

,

Luke Zettlemoyer

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

SpiRit-LM: Interleaved Spoken and Written Language Model.

[DOI]

,

Benjamin Muller

,

,

Marta R. Costa-jussà

,

,

,

Paul-Ambroise Duquenne

,

,

Ruslan Mavlyutov

,

,

Gabriel Synnaeve

,

,

,

Emmanuel Dupoux

CoRR, 2024

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants.

[DOI]

Lucas Bandarkar

,

,

Benjamin Muller

,

,

Satya Narayan Shukla

,

,

,

Abhinandan Krishnan

,

Luke Zettlemoyer

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning.

[DOI]

CoRR, 2023

The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages.

[DOI]

Benjamin Muller

,

Belen Alastruey

,

Prangthip Hansanti

,

,

Christophe Ropers

,

Eric Michael Smith

,

,

Luke Zettlemoyer

,

,

Marta R. Costa-jussà

Proceedings of the Eighth Conference on Machine Translation, 2023

Evaluating and Modeling Attribution for Cross-Lingual Question Answering.

[DOI]

Benjamin Muller

,

,

Jonathan H. Clark

,

Tom Kwiatkowski

,

Sebastian Ruder

,

,

,

Jonathan Herzig

,

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages.

[DOI]

,

Gerson Vizcarra

,

Tasmiah Tahsin Mayeesha

,

Benjamin Muller

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022

How Can We Make Language Models Better at Handling the Diversity and Variability of Natural Languages ? (Comment rendre les modèles de langue meilleurs face à la grande diversité et variabilité des langues ?).

[DOI]

Benjamin Muller

PhD thesis, 2022

Inria-ALMAnaCH at WMT 2022: Does Transcription Help Cross-Script Machine Translation?

[DOI]

,

,

Benjamin Muller

,

,

,

Proceedings of the Seventh Conference on Machine Translation, 2022

Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer.

[DOI]

Benjamin Muller

,

Deepanshu Gupta

,

Jean-Philippe Fauconnier

,

Siddharth Patwardhan

,

,

Proceedings of the Transfer Learning for Natural Language Processing Workshop, 2022

Quand être absent de mBERT n'est que le commencement : Gérer de nouvelles langues à l'aide de modèles de langues multilingues (When Being Unseen from mBERT is just the Beginning : Handling New Languages With Multilingual Language Models).

[DOI]

Benjamin Muller

,

Antonios Anastasopoulos

,

,

Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2022

Cross-Lingual Open-Domain Question Answering with Answer Sentence Generation.

[DOI]

Benjamin Muller

,

,

Rik Koncel-Kedziorski

,

,

Alessandro Moschitti

Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022

2021

Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering Approach for Open-Domain Question Answering.

[DOI]

Benjamin Muller

,

,

Rik Koncel-Kedziorski

,

,

Alessandro Moschitti

CoRR, 2021

When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models.

[DOI]

Benjamin Muller

,

Antonios Anastasopoulos

,

,

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT.

[DOI]

Benjamin Muller

,

,

,

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020

Can Multilingual Language Models Transfer to an Unseen Dialect? A Case Study on North African Arabizi.

[DOI]

Benjamin Muller

,

,

CoRR, 2020

Les modèles de langue contextuels Camembert pour le français : impact de la taille et de l'hétérogénéité des données d'entrainement (C AMEM BERT Contextual Language Models for French: Impact of Training Data Size and Heterogeneity ).

[DOI]

,

Benjamin Muller

,

Pedro Javier Ortiz Suárez

,

,

,

Éric Villemonte de la Clergerie

,

,

Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020

Establishing a New State-of-the-Art for French Named Entity Recognition.

[DOI]

Pedro Javier Ortiz Suárez

,

,

Benjamin Muller

,

,

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Building a User-Generated Content North-African Arabizi Treebank: Tackling Hell.

[DOI]

,

,

,

Matthieu Futeral

,

Benjamin Muller

,

Pedro Javier Ortiz Suárez

,

,

Abhishek Srivastava

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

CamemBERT: a Tasty French Language Model.

[DOI]

,

Benjamin Muller

,

Pedro Javier Ortiz Suárez

,

,

,

Éric de la Clergerie

,

,

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

Enhancing BERT for Lexical Normalization.

[DOI]

Benjamin Muller

,

,

Proceedings of the 5th Workshop on Noisy User-generated Text, 2019

2018

ELMoLex: Connecting ELMo and Lexicon Features for Dependency Parsing.

[DOI]

,

Benjamin Muller

,

,

,

Éric Villemonte de la Clergerie

,

,

Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Brussels, Belgium, October 31, 2018

Loading...