Anya Belz

CoRR, May, 2026

Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation.

[BibT_eX]

[DOI]

CoRR, May, 2026

Budgeted LoRA: Distillation as Structured Compute Allocation for Efficient Inference.

[BibT_eX]

[DOI]

CoRR, May, 2026

Beyond Outcome Verification: Verifiable Process Reward Models for Structured Reasoning.

[BibT_eX]

[DOI]

Yufang Hou

CoRR, January, 2026

2025

What Matters More For In-Context Learning under Matched Compute Budgets: Pretraining on Natural Text or Incorporating Targeted Synthetic Examples?

[BibT_eX]

[DOI]

CoRR, September, 2025

The QCET Taxonomy of Standard Quality Criterion Names and Definitions for the Evaluation of NLP Systems.

[BibT_eX]

[DOI]

CoRR, September, 2025

Enhancing Study-Level Inference from Clinical Trial Papers via RL-based Numeric Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

QRA++: Quantified Reproducibility Assessment for Common Types of Results in Natural Language Processing.

[BibT_eX]

[DOI]

CoRR, May, 2025

Combler les lacunes de Wikipédia : tirer parti de la génération de texte pour améliorer la couverture encyclopédique des groupes sous-représentés.

[BibT_eX]

[DOI]

Proceedings of the Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles, 2025

Scaling Up Data-to-Text Generation to Longer Sequences: A New Dataset and Benchmark Results for Generation from Large Triple Sets.

[BibT_eX]

[DOI]

Chinonso Cynthia Osuji

Ornait O'Connell

Thiago Castro Ferreira

Brian Davis

Proceedings of the 18th International Natural Language Generation Conference, 2025

Assessing Semantic Consistency in Data-to-Text Generation: A Meta-Evaluation of Textual, Semantic and Model-Based Metrics.

[BibT_eX]

[DOI]

Proceedings of the 18th International Natural Language Generation Conference, 2025

Evolving Stances on Reproducibility: A Longitudinal Study of NLP and ML Researchers' Views and Experience of Reproducibility.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Enhancing Study-Level Inference from Clinical Trial Papers via Reinforcement Learning-Based Numeric Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies.

[BibT_eX]

[DOI]

Joao H. Bettencourt-Silva

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Standard Quality Criteria Derived from Current NLP Evaluations for Guiding Evaluation Design and Grounding Comparability and AI Compliance Assessments.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

HEDS 3.0: The Human Evaluation Data Sheet Version 3.0.

[BibT_eX]

[DOI]

CoRR, 2024

Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques.

[BibT_eX]

[DOI]

CoRR, 2024

Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods.

[BibT_eX]

[DOI]

CoRR, 2024

Common Flaws in Running Human Evaluation Experiments in NLP.

[BibT_eX]

[DOI]

Comput. Linguistics, 2024

(Mostly) Automatic Experiment Execution for Human Evaluations of NLP Systems.

[BibT_eX]

[DOI]

Proceedings of the 17th International Natural Language Generation Conference, 2024

Filling Gaps in Wikipedia: Leveraging Data-to-Text Generation to Improve Encyclopedic Coverage of Underrepresented Groups.

[BibT_eX]

[DOI]

Proceedings of the 17th International Natural Language Generation Conference, 2024

Differences in Semantic Errors Made by Different Types of Data-to-text Systems.

[BibT_eX]

[DOI]

Rudali Huidrom

Proceedings of the 17th International Natural Language Generation Conference, 2024

QCET: An Interactive Taxonomy of Quality Criteria for Comparable and Repeatable Evaluation of NLP Systems.

[BibT_eX]

[DOI]

Proceedings of the 17th International Natural Language Generation Conference, 2024

Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods.

[BibT_eX]

[DOI]

Mohammed Mohammed

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Beyond Abstracts: A New Dataset, Prompt Design Strategy and Method for Biomedical Synthesis Generation.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), 2024

2023

Data-to-text Generation for Severely Under-Resourced Languages with GPT-3.5: A Bit of Help Needed from Google Translate.

[BibT_eX]

[DOI]

CoRR, 2023

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP.

[BibT_eX]

[DOI]

CoRR, 2023

PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques.

[BibT_eX]

[DOI]

CoRR, 2023

How to Control Sentiment in Text Generation: A Survey of the State-of-the-Art in Sentiment-Control Techniques.

[BibT_eX]

[DOI]

Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, 2023

Towards a Consensus Taxonomy for Annotating Errors in Automatically Generated Text.

[BibT_eX]

[DOI]

Rudali Huidrom

Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, 2023

Mod-D2T: A Multi-layer Dataset for Modular Data-to-Text Generation.

[BibT_eX]

[DOI]

Proceedings of the 16th International Natural Language Generation Conference, 2023

Exploring Variation of Results from Different Experimental Conditions.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Non-Repeatable Experiments and Non-Reproducible Results: The Reproducibility Crisis in Human Evaluation in NLP.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

A Metrological Perspective on Reproducibility in NLP.

[BibT_eX]

[DOI]

Alex Papadopoulos-Korfiatis

Comput. Linguistics, 2022

User-Driven Research of Medical Note Generation Software.

[BibT_eX]

[DOI]

Tom Knoll

Francesco Moramarco

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation.

[BibT_eX]

[DOI]

Aleksandar Savkov

Francesco Moramarco

Alex Papadopoulos-Korfiatis

Mark Perera

Alex Papadopoulos-Korfiatis

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: EMNLP 2022 - Industry Track, Abu Dhabi, UAE, December 7, 2022

Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation.

[BibT_eX]

[DOI]

Francesco Moramarco

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Quantified Reproducibility Assessment of NLP Results.

[BibT_eX]

[DOI]

Maja Popovic

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Quantifying Reproducibility in NLP and ML.

[BibT_eX]

[DOI]

CoRR, 2021

The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP.

[BibT_eX]

[DOI]

Anastasia Shimorina

CoRR, 2021

A Reproduction Study of an Annotation-based Human Evaluation of MT Outputs.

[BibT_eX]

[DOI]

Maja Popovic

Proceedings of the 14th International Conference on Natural Language Generation, 2021

Another PASS: A Reproduction Study of the Human Evaluation of a Football Report Generation System.

[BibT_eX]

[DOI]

Thiago Castro Ferreira

Brian Davis

Proceedings of the 14th International Conference on Natural Language Generation, 2021

The ReproGen Shared Task on Reproducibility of Human Evaluations in NLG: Overview and Results.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Natural Language Generation, 2021

A Systematic Review of Reproducibility Research in Natural Language Processing.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions.

[BibT_eX]

[DOI]

David M. Howcroft

Miruna-Adriana Clinciu

Proceedings of the 13th International Conference on Natural Language Generation, 2020

Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing.

[BibT_eX]

[DOI]

David M. Howcroft

Proceedings of the 13th International Conference on Natural Language Generation, 2020

ReproGen: Proposal for a Shared Task on Reproducibility of Human Evaluations in NLG.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Natural Language Generation, 2020

2019

Fully Automatic Journalism: We Need to Talk About Nonfake News Generation.

[BibT_eX]

[DOI]

Proceedings of the 2019 Truth and Trust Online Conference (TTO 2019), 2019

The Second Multilingual Surface Realisation Shared Task (SR'19): Overview and Evaluation Results.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Multilingual Surface Realisation, 2019

Conceptualisation and Annotation of Drug Nonadherence Information for Knowledge Extraction from Patient-Generated Texts.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Noisy User-generated Text, 2019

2018

From image to language and back again.

[BibT_eX]

[DOI]

Tamara L. Berg

Licheng Yu

Nat. Lang. Eng., 2018

Underspecified Universal Dependency Structures as Inputs for Multilingual Surface Realisation.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Natural Language Generation, 2018

Adding the Third Dimension to Spatial Relation Detection in 2D Images.

[BibT_eX]

[DOI]

Brandon Birmingham

Proceedings of the 11th International Conference on Natural Language Generation, 2018

SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Natural Language Generation, 2018

2017

Learning to Generate Descriptions of Visual Data Anchored in Spatial Relations.

[BibT_eX]

[DOI]

IEEE Comput. Intell. Mag., 2017

Shared Task Proposal: Multilingual Surface Realization Using Universal Dependency Trees.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Natural Language Generation, 2017

2016

Effect of Data Annotation, Feature Selection and Model Choice on Spatial Description Generation in French.

[BibT_eX]

[DOI]

Proceedings of the INLG 2016, 2016

Analysis of Twitter Data for Postmarketing Surveillance in Pharmacovigilance.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Noisy User-generated Text, 2016

Exploring Different Preposition Sets, Models and Feature Sets in Automatic Generation of Spatial Image Descriptions.

[BibT_eX]

[DOI]

Brandon Birmingham

Proceedings of the 5th Workshop on Vision and Language, 2016

2015

Generating Descriptions of Spatial Relations between Objects in Images.

[BibT_eX]

[DOI]

Proceedings of the ENLG 2015, 2015

Describing Spatial Relationships between Objects in Images in English and French.

[BibT_eX]

[DOI]

Proceedings of the Fourth Workshop on Vision and Language, 2015

2014

A Comparative Evaluation Methodology for NLG in Interactive Systems.

[BibT_eX]

[DOI]

Helen F. Hastie

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

The Last 10 Metres: Using Visual Analysis and Verbal Communication in Guiding Visually Impaired Smartphone Users to Entrances.

[BibT_eX]

[DOI]

Anil A. Bharath

Proceedings of the Third Workshop on Vision and Language, 2014

Comparative evaluation and shared tasks for NLG in interactive systems.

[BibT_eX]

[DOI]

Helen F. Hastie

Proceedings of the Natural Language Generation in Interactive Systems, 2014

2012

LG-Eval: A Toolkit for Creating Online Language Evaluation Experiments.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

A Repository of Data and Evaluation Resources for Natural Language Generation.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

The Surface Realisation Task: Recent Developments and Future Plans.

[BibT_eX]

[DOI]

Proceedings of the INLG 2012 - Proceedings of the Seventh International Natural Language Generation Conference, 30 May 2012, 2012

2011

The First Surface Realisation Shared Task: Overview and Evaluation Results.

[BibT_eX]

[DOI]

Proceedings of the ENLG 2011, 2011

Generation Challenges 2011 Preface.

[BibT_eX]

[DOI]

Proceedings of the ENLG 2011, 2011

Discrete vs. Continuous Rating Scales for Language Evaluation in NLP.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, 2011

Unsupervised Alignment of Comparable Data and Text Resources.

[BibT_eX]

[DOI]

Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, 2011

2010

A Game-based Approach to Transcribing Images of Text.

[BibT_eX]

[DOI]

Khalil Dahab

Proceedings of the International Conference on Language Resources and Evaluation, 2010

Finding Common Ground: Towards a Surface Realisation Shared Task.

[BibT_eX]

[DOI]

Proceedings of the INLG 2010, 2010

The GREC Challenges 2010: Overview and Evaluation Results.

[BibT_eX]

[DOI]

Proceedings of the INLG 2010, 2010

Extracting Parallel Fragments from Comparable Corpora for Data-to-text Generation.

[BibT_eX]

[DOI]

Proceedings of the INLG 2010, 2010

Comparing Rating Scales and Preference Judgements in Language Evaluation.

[BibT_eX]

[DOI]

Proceedings of the INLG 2010, 2010

Generation Challenges 2010 Preface.

[BibT_eX]

[DOI]

Alexander Koller

Proceedings of the INLG 2010, 2010

Introducing Shared Tasks to NLG: The TUNA Shared Task Evaluation Challenges.

[BibT_eX]

[DOI]

Proceedings of the Empirical Methods in Natural Language Generation: Data-oriented Methods and Empirical Evaluation, 2010

Generating Referring Expressions in Context: The GREC Task Evaluation Challenges.

[BibT_eX]

[DOI]

Proceedings of the Empirical Methods in Natural Language Generation: Data-oriented Methods and Empirical Evaluation, 2010

Assessing the Trade-Off between System Building Cost and Output Quality in Data-to-Text Generation.

[BibT_eX]

[DOI]

Proceedings of the Empirical Methods in Natural Language Generation: Data-oriented Methods and Empirical Evaluation, 2010

2009

An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems.

[BibT_eX]

[DOI]

Comput. Linguistics, 2009

That's Nice ... What Can You Do With It?

[BibT_eX]

[DOI]

Comput. Linguistics, 2009

The TUNA-REG Challenge 2009: Overview and Evaluation Results.

[BibT_eX]

[DOI]

Proceedings of the ENLG 2009, 2009

System Building Cost vs. Output Quality in Data-to-Text Generation.

[BibT_eX]

[DOI]

Proceedings of the ENLG 2009, 2009

Generation Challenges 2009: Preface.

[BibT_eX]

[DOI]

Proceedings of the ENLG 2009, 2009

2008

Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models.

[BibT_eX]

[DOI]

Nat. Lang. Eng., 2008

The TUNA Challenge 2008: Overview and Evaluation Results.

[BibT_eX]

[DOI]

Proceedings of the INLG 2008, 2008

Attribute Selection for Referring Expression Generation: New Algorithms and Evaluation Methods.

[BibT_eX]

[DOI]

Proceedings of the INLG 2008, 2008

The GREC Challenge 2008: Overview and Evaluation Results.

[BibT_eX]

[DOI]

Proceedings of the INLG 2008, 2008

REG Challenge Preface.

[BibT_eX]

[DOI]

Proceedings of the INLG 2008, 2008

Intrinsic vs. Extrinsic Evaluation Measures for Referring Expression Generation.

[BibT_eX]

[DOI]

Proceedings of the ACL 2008, 2008

2007

Probabilistic Generation of Weather Forecast Texts.

[BibT_eX]

[DOI]

Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Modelling control in generation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh European Workshop on Natural Language Generation, 2007

Generation of repeated references to discourse entities.

[BibT_eX]

[DOI]

Sebastian Varges

Proceedings of the Eleventh European Workshop on Natural Language Generation, 2007

2006

GENEVAL: A Proposal for Shared-task Evaluation in NLG.

[BibT_eX]

[DOI]

Proceedings of the INLG 2006, 2006

Shared-Task Evaluations in HLT: Lessons for NLG.

[BibT_eX]

[DOI]

Adam Kilgarriff

Proceedings of the INLG 2006, 2006

Introduction to the INLG'06 Special Session on Sharing Data and Comparative Evaluation.

[BibT_eX]

[DOI]

Robert Dale

Proceedings of the INLG 2006, 2006

Comparing Automatic and Human Evaluation of NLG Systems.

[BibT_eX]

[DOI]

Proceedings of the EACL 2006, 2006

2005

Statistical Generation: Three Methods Compared and Evaluated.

[BibT_eX]

[DOI]

Proceedings of the Tenth European Workshop on Natural Language Generation, 2005

2002

PILLS: Multilingual generation of medical information documents with overlapping content.

[BibT_eX]

[DOI]

Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

PCFG Learning by Nonterminal Partition Search.

[BibT_eX]

[DOI]

Proceedings of the Grammatical Inference: Algorithms and Applications, 2002

Learning Grammars for Different Parsing Tasks by Partition Search.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Computational Linguistics, 2002

2001

Multi-Syllable Phonotactic Modelling

[BibT_eX]

[DOI]

CoRR, 2001

Learning Computational Grammars.

[BibT_eX]

[DOI]

James Alistair Hammerton

Rob Koeling

Stasinos Konstantopoulos

Miles Osborne

Franck Thollard

Erik F. Tjong Kim Sang

Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning, 2001

2000

Computational learning of finite-state models for natural language processing.

[BibT_eX]

[DOI]

PhD thesis, 2000

1998

An Approach to the Automatic Acquisition of Phonotactic Constraints.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Computation of Phonological Constraints, 1998

A Few English Words Can Help Improve Your Russian.

[BibT_eX]

Proceedings of the 13th European Conference on Artificial Intelligence, 1998

Discovering Phonotactic Finite-State Automata by Generic Search.

[BibT_eX]

[DOI]