Yacine Jernite

Orcid: 0000-0002-8053-6862

According to our database1, Yacine Jernite authored at least 49 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
On the Societal Impact of Open Foundation Models.
CoRR, 2024

On the Standardization of Behavioral Use Clauses and Their Adoption for Responsible Licensing of AI.
CoRR, 2024

2023
The BigCode Project Governance Card.
CoRR, 2023

Power Hungry Processing: Watts Driving the Cost of AI Deployment?
CoRR, 2023

Evaluating the Social Impact of Generative AI Systems in Systems and Society.
CoRR, 2023

StarCoder: may the source be with you!
CoRR, 2023

Stable Bias: Analyzing Societal Representations in Diffusion Models.
CoRR, 2023

Towards Openness Beyond Open Access: User Journeys through 3 Open AI Collaboratives.
CoRR, 2023

SantaCoder: don't reach for the stars!
CoRR, 2023

Stable Bias: Evaluating Societal Representations in Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Stronger Together: on the Articulation of Ethical Charters, Legal Tools, and Technical Documentation in ML.
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023

Can Licensing Mitigate the Negative Implications of Commercial Web Scraping?
Proceedings of the Computer Supported Cooperative Work and Social Computing, 2023

AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages.
Proceedings of the 4th Workshop on African Natural Language Processing, 2023

The ROOTS Search Tool: Data Transparency for LLMs.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

2022
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets.
Trans. Assoc. Comput. Linguistics, 2022

Measuring Data.
CoRR, 2022

BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model.
CoRR, 2022

The Stack: 3 TB of permissively licensed source code.
CoRR, 2022

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.
CoRR, 2022

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code.
CoRR, 2022

Data Governance in the Age of Large-Scale Data-Driven Language Technology.
CoRR, 2022

Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources.
CoRR, 2022


Data Governance in the Age of Large-Scale Data-Driven Language Technology.
Proceedings of the FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21, 2022

2021
Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards.
CoRR, 2021

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics.
CoRR, 2021

Distributed Deep Learning In Open Collaborations.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Training Transformers Together.
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

KILT: a Benchmark for Knowledge Intensive Language Tasks.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021


2020
KILT: a Benchmark for Knowledge Intensive Language Tasks.
CoRR, 2020

Transformers: State-of-the-Art Natural Language Processing.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020

CraftAssist Instruction Parsing: Semantic Parsing for a Voxel-World Assistant.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Improving documentation of presenting problems in the emergency department using a domain-specific ontology and machine learning-driven user interfaces.
Int. J. Medical Informatics, 2019

Improving Conditioning in Context-Aware Sequence to Sequence Models.
CoRR, 2019

Unsupervised Text Summarization via Mixed Model Back-Translation.
CoRR, 2019

Why Build an Assistant in Minecraft?
CoRR, 2019

CraftAssist: A Framework for Dialogue-enabled Interactive Agents.
CoRR, 2019

CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant.
CoRR, 2019

ELI5: Long Form Question Answering.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Learning Representations of Text through Language and Discourse Modeling: From Characters to Sentences.
PhD thesis, 2018

2017
Grounded Recurrent Neural Networks.
CoRR, 2017

Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning.
CoRR, 2017

Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation.
Proceedings of the 34th International Conference on Machine Learning, 2017

Variable Computation in Recurrent Neural Networks.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
Simultaneous Learning of Trees and Representations for Extreme Classification, with Application to Language Modeling.
CoRR, 2016

Character-Aware Neural Language Models.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
A Fast Variational Approach for Learning Markov Random Field Language Models.
Proceedings of the 32nd International Conference on Machine Learning, 2015

2013
Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013


  Loading...