Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Self-DC: When to Reason and When to Act? Self Divide-and-Conquer for Compositional Unknown Questions.

[BibT_eX]

[DOI]

Hongru Wang

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

How Likely Do LLMs with CoT Mimic Human Reasoning?

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

Unlocking Recursive Thinking of LLMs: Alignment via Refinement.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

HPSS: Heuristic Prompting Strategy Search for LLM Evaluators.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

LongSafety: Evaluating Long-Context Safety of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Training Language Model to Critique for Better Refinement.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

A Survey on Evaluation of Large Language Models.

[BibT_eX]

[DOI]

ACM Trans. Intell. Syst. Technol., June, 2024

Long<sup>2</sup>RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall.

[BibT_eX]

[DOI]

CoRR, 2024

Nash CoT: Multi-Path Inference with Preference Equilibrium.

[BibT_eX]

[DOI]

CoRR, 2024

NovelQA: A Benchmark for Long-Range Novel Question Answering.

[BibT_eX]

[DOI]

CoRR, 2024

Knowledge Conflicts for LLMs: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

LLMs with Chain-of-Thought Are Non-Causal Reasoners.

[BibT_eX]

[DOI]

CoRR, 2024

SQL-CRAFT: Text-to-SQL through Interactive Refinement and Enhanced Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions.

[BibT_eX]

[DOI]

CoRR, 2024

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Nash CoT: Multi-Path Inference with Preference Equilibrium.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Knowledge Conflicts for LLMs: A Survey.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

LONG²RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity.

[BibT_eX]

[DOI]

CoRR, 2023

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization.

[BibT_eX]

[DOI]

CoRR, 2023

Evaluating Open Question Answering Evaluation.

[BibT_eX]

[DOI]

CoRR, 2023

Knowledgeable Salient Span Mask for Enhancing Language Models as Knowledge Base.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2023

Evaluating Open-QA Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

TRAMS: Training-free Memory Selection for Long-range Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

RFiD: Towards Rational Fusion-in-Decoder for Open-Domain Question Answering.

[BibT_eX]

[DOI]

Cunxiang Wang

Haofei Yu

Yue Zhang

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Exploiting Abstract Meaning Representation for Open-Domain Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

On Effectively Learning of Knowledge in Continual Pre-training.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2021

Can Generative Pre-trained Language Models Serve As Knowledge Bases for Closed-book QA?

[BibT_eX]

[DOI]

Cunxiang Wang

Pai Liu

Yue Zhang

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Commonsense Knowledge Graph Reasoning by Selection or Generation? Why?

[BibT_eX]

[DOI]

CoRR, 2020

SemEval-2020 Task 4: Commonsense Validation and Explanation.

[BibT_eX]

[DOI]