Samuel R. Bowman

Affiliations:
  • New York University, Department of Linguistics, USA
  • Stanford University, Department of Linguistics (former)


According to our database1, Samuel R. Bowman authored at least 114 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought.
CoRR, 2024

Debating with More Persuasive LLMs Leads to More Truthful Answers.
CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark.
CoRR, 2023

Debate Helps Supervise Unreliable Experts.
CoRR, 2023

Towards Understanding Sycophancy in Language Models.
CoRR, 2023

Studying Large Language Model Generalization with Influence Functions.
CoRR, 2023

Measuring Faithfulness in Chain-of-Thought Reasoning.
CoRR, 2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning.
CoRR, 2023

Inverse Scaling: When Bigger Isn't Better.
CoRR, 2023

Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs.
CoRR, 2023

Eight Things to Know about Large Language Models.
CoRR, 2023

Improving Code Generation by Training with Natural Language Feedback.
CoRR, 2023

The Capacity for Moral Self-Correction in Large Language Models.
CoRR, 2023

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Pretraining Language Models with Human Preferences.
Proceedings of the International Conference on Machine Learning, 2023

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023


What Do NLP Researchers Believe? Results of the NLP Community Metasurvey.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

(QA)²: Question Answering with Questionable Assumptions.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Instruction Induction: From Few Examples to Natural Language Task Descriptions.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
(QA)<sup>2</sup>: Question Answering with Questionable Assumptions.
CoRR, 2022

Discovering Language Model Behaviors with Model-Written Evaluations.
CoRR, 2022

Constitutional AI: Harmlessness from AI Feedback.
CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.
CoRR, 2022

Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions.
CoRR, 2022

What Artificial Neural Networks Can Tell Us About Human Language Acquisition.
CoRR, 2022

Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions.
CoRR, 2022

QuALITY: Question Answering with Long Input Texts, Yes!
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

SQuALITY: Building a Long-Document Summarization Dataset the Hard Way.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

SocioProbe: What, When, and Where Language Models Learn about Sociodemographics.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

What Makes Reading Comprehension Questions Difficult?
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

BBQ: A hand-built bias benchmark for question answering.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Adversarially Constructed Evaluation Sets Are More Challenging, but May Not Be Fair.
CoRR, 2021

Learning with Noisy Labels by Targeted Relabeling.
CoRR, 2021

When Combating Hype, Proceed with Caution.
CoRR, 2021

What Will it Take to Fix Benchmarking in Natural Language Understanding?
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Does Putting a Linguist in the Loop Improve NLU Data Collection?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

NOPE: A Corpus of Naturally-Occurring Presuppositions in English.
Proceedings of the 25th Conference on Computational Natural Language Learning, 2021

Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers.
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2021

When Do You Need Billions of Words of Pretraining Data?
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Comparing Test Sets with Item Response Theory.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Erratum: "BLiMP: The Benchmark of Linguistic Minimal Pairs for English".
Trans. Assoc. Comput. Linguistics, 2020

BLiMP: The Benchmark of Linguistic Minimal Pairs for English.
Trans. Assoc. Comput. Linguistics, 2020

When Do You Need Billions of Words of Pretraining Data?
CoRR, 2020

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually).
CoRR, 2020

Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?
CoRR, 2020

Collecting Entailment Data for Pretraining: New Protocols and Negative Results.
CoRR, 2020

Self-Training for Unsupervised Parsing with PRPN.
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies, 2020

Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually).
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Precise Task Formalization Matters in Winograd Schema Evaluations.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

New Protocols and Negative Results for Textual Entailment Data Collection.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Can neural networks acquire a structural bias from raw linguistic data?
Proceedings of the 42th Annual Meeting of the Cognitive Science Society, 2020

jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data.
Proceedings of the First Workshop on Insights from Negative Results in NLP, 2020

Learning to Learn Morphological Inflection for Resource-Poor Languages.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Neural Network Acceptability Judgments.
Trans. Assoc. Comput. Linguistics, 2019

Do Attention Heads in BERT Track Syntactic Dependencies?
CoRR, 2019

Inducing Constituency Trees through Neural Machine Translation.
CoRR, 2019

Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments.
CoRR, 2019

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension.
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics, 2019

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Can Unconditional Language Models Recover Arbitrary Sentences?
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On Measuring Social Biases in Sentence Encoders.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Deep Learning for Natural Language Inference.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Identifying and Reducing Gender Bias in Word-Level Language Models.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

What do you learn from context? Probing for sentence structure in contextualized word representations.
Proceedings of the 7th International Conference on Learning Representations, 2019

Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Neural Unsupervised Parsing Beyond English.
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP, 2019

2018
Do latent tree learning models identify meaningful structure in sentences?
Trans. Assoc. Comput. Linguistics, 2018

Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling.
CoRR, 2018

Verb Argument Structure Alternations in Word and Sentence Embeddings.
CoRR, 2018

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks.
CoRR, 2018

Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis.
CoRR, 2018

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

ListOps: A Diagnostic Dataset for Latent Tree Learning.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 2018

Training a Ranking Function for Open-Domain Question Answering.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 2018

Annotation Artifacts in Natural Language Inference Data.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Stable and Effective Trainable Greedy Decoding for Sequence to Sequence Learning.
Proceedings of the 6th International Conference on Learning Representations, 2018

Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis.
Proceedings of the Workshop: Analyzing and Interpreting Neural Networks for NLP, 2018

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding.
Proceedings of the Workshop: Analyzing and Interpreting Neural Networks for NLP, 2018

Grammar Induction with Neural Language Models: An Unusual Replication.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

XNLI: Evaluating Cross-lingual Sentence Representations.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

A Stable and Effective Learning Strategy for Trainable Greedy Decoding.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

The Lifted Matrix-Space Model for Semantic Composition.
Proceedings of the 22nd Conference on Computational Natural Language Learning, 2018

Ruminating Reader: Reasoning with Gated Multi-hop Attention.
Proceedings of the Workshop on Machine Reading for Question Answering@ACL 2018, 2018

2017
The Lifted Matrix-Space Model for Semantic Composition.
CoRR, 2017

Learning to parse from a semantic objective: It works. Is it syntax?
CoRR, 2017

Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning.
CoRR, 2017

Sequential Attention.
CoRR, 2017

The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations.
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, 2017

Sequential Attention: A Context-Aware Alignment Function for Machine Reading.
Proceedings of the 2nd Workshop on Representation Learning for NLP, 2017

2016
Generating Sentences from a Continuous Space.
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2016

A Fast Unified Model for Parsing and Sentence Understanding.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

2015
Tree-Structured Composition in Neural Networks without Tree-Structured Architectures.
Proceedings of the NIPS Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches co-located with the 29th Annual Conference on Neural Information Processing Systems (NIPS 2015), 2015

A large annotated corpus for learning natural language inference.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Recursive Neural Networks Can Learn Logical Semantics.
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, 2015

Learning Distributed Word Representations for Natural Logic Reasoning.
Proceedings of the 2015 AAAI Spring Symposia, 2015

2014
Recursive Neural Networks for Learning Logical Semantics.
CoRR, 2014

Can recursive neural tensor networks learn logical reasoning?
Proceedings of the 2nd International Conference on Learning Representations, 2014

A Gold Standard Dependency Corpus for English.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

2013
More Constructions, More Genres: Extending Stanford Dependencies.
Proceedings of the Second International Conference on Dependency Linguistics, 2013

2012
Automatic Animacy Classification.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

2011
Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Modeling pronunciation variation with context-dependent articulatory feature decision trees.
Proceedings of the INTERSPEECH 2010, 2010


  Loading...