Michael Kuchnik

Orcid: 0000-0002-0805-1828

According to our database1, Michael Kuchnik authored at least 20 papers between 2018 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
AIRA_2: Overcoming Bottlenecks in AI Research Agents.
CoRR, March, 2026

2025
PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training.
CoRR, October, 2025

Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead.
CoRR, October, 2025

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench.
CoRR, July, 2025

Revisiting Reliability in Large-Scale Machine Learning Research Clusters.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024
A Standardized Machine-readable Dataset Documentation Format for Responsible AI.
CoRR, 2024

Introducing v0.5 of the AI Safety Benchmark from MLCommons.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, 2024

Croissant: A Metadata Format for ML-Ready Datasets.
CoRR, 2024



2023
Beyond Model Efficiency: Data Optimizations for Machine Learning Systems
PhD thesis, 2023

Validating Large Language Models with ReLM.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

2022
Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

2021
Progressive Compressed Records: Taking a Byte out of Deep Learning Data.
Proc. VLDB Endow., 2021

2020
File Systems Unfit as Distributed Storage Back Ends: Lessons from 10 Years of Ceph Evolution.
login Usenix Mag., 2020

The Case for Custom Storage Backends in Distributed Storage Systems.
ACM Trans. Storage, 2020

2019
File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

Efficient Augmentation via Data Subsampling.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
The Atlas Cluster Trace Repository.
login Usenix Mag., 2018


  Loading...