David Samuel

Orcid: 0000-0003-2866-1022

According to our database1, David Samuel authored at least 30 papers between 1990 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies.
CoRR, March, 2025

Multi-label Scandinavian Language Identification (SLIDE).
CoRR, February, 2025

Small Languages, Big Models: A Study of Continual Training on Languages of Norway.
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies, 2025

The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective.
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies, 2025

NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics, 2025


2024
The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective.
CoRR, 2024

Small Languages, Big Models: A Study of Continual Training on Languages of Norway.
CoRR, 2024

GPT or BERT: why not both?
CoRR, 2024

It's Difficult to be Neutral - Human and LLM-based Sentiment Annotation of Patient Comments.
CoRR, 2024

BERTs are Generative In-Context Learners.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

More room for language: Investigating the effect of retrieval on language models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

2023
Not all layers are equally as important: Every Layer Counts BERT.
CoRR, 2023

Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings.
CoRR, 2023

NorBench - A Benchmark for Norwegian Language Models.
Proceedings of the 24th Nordic Conference on Computational Linguistics, 2023

NoCoLA: The Norwegian Corpus of Linguistic Acceptability.
Proceedings of the 24th Nordic Conference on Computational Linguistics, 2023

BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer.
Proceedings of the 24th Nordic Conference on Computational Linguistics, 2023

Trained on 100 million words and still in shape: BERT meets British National Corpus.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Tokenization with Factorized Subword Encoding.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction.
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text, 2022

EventGraph: Event Extraction as Semantic Graph Parsing.
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text, 2022

Direct parsing to sentiment graphs.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2022

2021
RobeCzech: Czech RoBERTa, a Monolingual Contextualized Language Representation Model.
Proceedings of the Text, Speech, and Dialogue - 24th International Conference, 2021

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5.
Proceedings of the Seventh Workshop on Noisy User-generated Text, 2021

2020
A2Cloud-RF: A random forest based statistical framework to guide resource selection for high-performance scientific computing on the cloud.
Concurr. Comput. Pract. Exp., 2020

A2Cloud-cc: A Machine Learning Council to Guide Cloud Resource Selection for Scientific Applications.
Proceedings of the 19th IEEE International Symposium on Network Computing and Applications, 2020

Meta-Learning Extractors for Music Source Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

ÚFAL at MRP 2020: Permutation-invariant Semantic Parsing in PERIN.
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing, 2020

2019
Composing Multi-Instrumental Music with Recurrent Neural Networks.
Proceedings of the International Joint Conference on Neural Networks, 2019

1990
Computing the external geodesic diameter of a simple polygon.
Computing, 1990


  Loading...