Shannon Shen

Orcid: 0009-0009-2704-6950

Affiliations:

MIT, Cambridge, MA, USA
Allen Institute for AI, Seattle, USA (former)
Nanjing Tech University, School of Computer Science and Technology, China (former)

According to our database¹, Shannon Shen authored at least 37 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

When One LLM Drools, Multi-LLM Collaboration Rules.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

Olmo 3.

[BibT_eX]

[DOI]

Lester James V. Miranda

CoRR, December, 2025

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research.

[BibT_eX]

[DOI]

CoRR, November, 2025

GovScape: A Public Multimodal Search System for 70 Million Pages of Government PDFs.

[BibT_eX]

[DOI]

Benjamin Charles Germain Lee

CoRR, November, 2025

Completion ≠ Collaboration: Scaling Collaborative Effort with Agents.

[BibT_eX]

[DOI]

CoRR, October, 2025

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Retrieval-augmented systems can be dangerous medical communicators.

[BibT_eX]

[DOI]

CoRR, February, 2025

Recent Advances, Applications and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2024 Symposium.

[BibT_eX]

[DOI]

CoRR, February, 2025

Position: Retrieval-augmented systems can be dangerous medical communicators.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

CourtReasoner: Can LLM Agents Reason Like Judges?

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

The Semantic Reader Project.

[BibT_eX]

[DOI]

Yoganand Chandrasekhar

Commun. ACM, October, 2024

Machine learning to predict notes for chart review in the oncology setting: a proof of concept strategy for improving clinician note-writing.

[BibT_eX]

[DOI]

J. Am. Medical Informatics Assoc., 2024

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature.

[BibT_eX]

[DOI]

CoRR, 2024

A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Conference on Health, 2024

A Design Space for Intelligent and Interactive Writing Assistants.

[BibT_eX]

[DOI]

Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Learning to Decode Collaboratively with Multiple Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Towards Verifiable Text Generation with Symbolic References.

[BibT_eX]

[DOI]

Lucas Torroba Hennigen

CoRR, 2023

Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces.

[BibT_eX]

[DOI]

Yoganand Chandrasekhar

CoRR, 2023

The Semantic Scholar Open Data Platform.

[BibT_eX]

[DOI]

CoRR, 2023

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Conceptualizing Machine Learning for Dynamic Information Retrieval of Electronic Health Record Notes.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Healthcare Conference, 2023

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents.

[BibT_eX]

[DOI]

Yoganand Chandrasekhar

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2022

Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search.

[BibT_eX]

[DOI]

CoRR, 2022

Multi-LexSum: Real-world Summaries of Civil Rights Lawsuits at Multiple Granularities.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Incorporating Visual Layout Structures for Scientific Text Classification.

[BibT_eX]

[DOI]

CoRR, 2021

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis.

[BibT_eX]

[DOI]

Zejiang Shen

Ruochen Zhang

Melissa Dell

Benjamin Charles Germain Lee

Jacob Carlson

Weining Li

Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021

PAWLS: PDF Annotation With Labels and Structure.

[BibT_eX]

[DOI]

Mark Neumann

Zejiang Shen

Sam Skjonsberg

Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

OLALA: Object-Level Active Learning Based Layout Annotation.

[BibT_eX]

[DOI]

CoRR, 2020

Generating Object Stamps.

[BibT_eX]