Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, 2024

In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation.

[BibT_eX]

[DOI]

Shiqi Chen

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Symbolic Variables in Distributed Networks that Count.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual Meeting of the Cognitive Science Society, 2024

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations.

[BibT_eX]

[DOI]

Proceedings of the Causal Learning and Reasoning, 2024

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation.

[BibT_eX]

[DOI]

Zhengxuan Wu

Christopher D. Manning

Christopher Potts

Trans. Assoc. Comput. Linguistics, 2023

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca.

[BibT_eX]

[DOI]

CoRR, 2023

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Causal Proxy Models for Concept-based Model Explanations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions.

[BibT_eX]

[DOI]

Zexuan Zhong

Zhengxuan Wu

Christopher D. Manning

Christopher Potts

Danqi Chen

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies.

[BibT_eX]

[DOI]

Zhengxuan Wu

Alex Tamkin

Isabel Papadimitriou

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Rigorously Assessing Natural Language Explanations of Neurons.

[BibT_eX]

[DOI]

Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, 2023

Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Oolong: Investigating What Makes Crosslingual Transfer Hard with Controlled Studies.

[BibT_eX]

[DOI]

Zhengxuan Wu

Isabel Papadimitriou

Alex Tamkin

CoRR, 2022

Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models.

[BibT_eX]

[DOI]

Zhengxuan Wu

Nelson F. Liu

Christopher Potts

Proceedings of the 7th Workshop on Representation Learning for NLP, 2022

ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Causal Distillation for Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Inducing Causal Structure for Interpretable Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

2021

Modeling Emotion in Complex Stories: The Stanford Emotional Narratives Dataset.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2021

Attention uncovers task-relevant semantics in emotional narrative understanding.

[BibT_eX]

[DOI]

Thanh-Son Nguyen

Zhengxuan Wu

Desmond C. Ong

Knowl. Based Syst., 2021

On Explaining Your Explanations of BERT: An Empirical Study with Sequence Classification.

[BibT_eX]

[DOI]

Zhengxuan Wu

Desmond C. Ong

CoRR, 2021

ReaSCAN: Compositional Reasoning in Language Grounding.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Dynabench: Rethinking Benchmarking in NLP.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Not Now, Ask Later: Users Weaken Their Behavior Change Regimen Over Time, But Expect To Re-Strengthen It Imminently.

[BibT_eX]

[DOI]

Geza Kovacs

Zhengxuan Wu

Michael S. Bernstein

Proceedings of the CHI '21: CHI Conference on Human Factors in Computing Systems, 2021

DynaSent: A Dynamic Benchmark for Sentiment Analysis.

[BibT_eX]

[DOI]

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis.

[BibT_eX]

[DOI]

Zhengxuan Wu

Desmond C. Ong

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Structured Self-Attention Weights Encode Semantics in Sentiment Analysis.

[BibT_eX]

[DOI]

Zhengxuan Wu

Thanh-Son Nguyen