Loubna Ben Allal

According to our database1, Loubna Ben Allal authored at least 11 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text.
CoRR, June, 2025

SmolVLM: Redefining small and efficient multimodal models.
CoRR, April, 2025

SmolLM2: When Smol Goes Big - Data-Centric Training of a Small Language Model.
CoRR, February, 2025

2024
StarCoder 2 and The Stack v2: The Next Generation.
CoRR, 2024

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
StarCoder: may the source be with you!
Trans. Mach. Learn. Res., 2023

The Stack: 3 TB of permissively licensed source code.
Trans. Mach. Learn. Res., 2023

The BigCode Project Governance Card.
CoRR, 2023

SantaCoder: don't reach for the stars!
CoRR, 2023

2022


  Loading...