Torsten Scholak

According to our database1, Torsten Scholak authored at least 12 papers between 2021 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Apriel-Nemotron-15B-Thinker.
CoRR, August, 2025

Using Scaling Laws for Data Source Utility Estimation in Domain-Specific Pre-Training.
CoRR, July, 2025

Unifying Autoregressive and Diffusion-Based Sequence Generation.
CoRR, April, 2025

2024
TapeAgents: a Holistic Framework for Agent Development and Optimization.
CoRR, 2024

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks.
CoRR, 2024

Perplexed: Understanding When Large Language Models are Confused.
CoRR, 2024

StarCoder 2 and The Stack v2: The Next Generation.
CoRR, 2024

2023
RepoFusion: Training Code Models to Understand Your Repository.
CoRR, 2023

2022
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Towards Neural Functional Program Evaluation.
CoRR, 2021

DuoRAT: Towards Simpler Text-to-SQL Models.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021


  Loading...