Xiangru Tang

Orcid: 0009-0006-2700-4513

According to our database1, Xiangru Tang authored at least 88 papers between 2019 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Igniting Language Intelligence: The Hitchhiker's Guide from Chain-of-Thought Reasoning to Language Agents.
ACM Comput. Surv., August, 2025

You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation.
CoRR, August, 2025

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.
CoRR, August, 2025

CellForge: Agentic Design of Virtual Cell Models.
CoRR, August, 2025

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving.
CoRR, July, 2025

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks.
CoRR, July, 2025

OAgents: An Empirical Study of Building Effective Agents.
CoRR, June, 2025

Scaling Test-time Compute for LLM Agents.
CoRR, June, 2025

Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards.
CoRR, June, 2025

MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale.
CoRR, June, 2025

MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning.
CoRR, June, 2025

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations.
CoRR, May, 2025

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows.
CoRR, May, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.
CoRR, May, 2025

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems.
CoRR, April, 2025

LocAgent: Graph-Guided LLM Agents for Code Localization.
CoRR, March, 2025

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning.
CoRR, March, 2025

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents.
CoRR, March, 2025

MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems.
CoRR, March, 2025

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding.
CoRR, January, 2025

ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning.
CoRR, January, 2025

ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OpenHands: An Open Platform for AI Software Developers as Generalist Agents.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Data preparation for Deep Learning based Code Smell Detection: A systematic literature review.
J. Syst. Softw., 2024

ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain.
CoRR, 2024

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents.
CoRR, 2024

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents.
CoRR, 2024

Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation.
CoRR, 2024

Step-Back Profiling: Distilling User History for Personalized Scientific Writing.
CoRR, 2024

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes.
CoRR, 2024

Lessons from the Trenches on Reproducible Evaluation of Language Models.
CoRR, 2024

MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise.
CoRR, 2024

StarCoder 2 and The Stack v2: The Next Generation.
CoRR, 2024

Data Interpreter: An LLM Agent For Data Science.
CoRR, 2024

A Survey of Generative AI for De Novo Drug Design: New Frontiers in Molecule and Protein Generation.
CoRR, 2024

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science.
CoRR, 2024

Weaver: Foundation Models for Creative Writing.
CoRR, 2024

<i>MolLM</i>: a unified language model for integrating biomedical text with 2D and 3D molecular representations.
Bioinform., 2024

BioCoder: a benchmark for bioinformatics code generation with large language models.
Bioinform., 2024

A survey of generative AI for <i>de novo</i> drug design: new frontiers in molecule and protein generation.
Briefings Bioinform., 2024

Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data?
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

Investigating Data Contamination in Modern Benchmarks for Large Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

OctoPack: Instruction Tuning Code Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

FinDVer: Explainable Claim Verification over Long and Hybrid-content Financial Documents.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Unveiling the Spectrum of Data Contamination in Language Model: A Survey from Detection to Remediation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Financial Documents.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning.
CoRR, 2023

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks.
CoRR, 2023

DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data.
CoRR, 2023

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity.
CoRR, 2023

Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models.
CoRR, 2023

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
CoRR, 2023

BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge.
CoRR, 2023

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs.
CoRR, 2023

Large Language Models are Effective Table-to-Text Generators, Evaluators, and Feedback Providers.
CoRR, 2023

QTSumm: A New Benchmark for Query-Focused Table Summarization.
CoRR, 2023

RWKV: Reinventing RNNs for the Transformer Era.
CoRR, 2023


Investigating Table-to-Text Generation Capabilities of Large Language Models in Real-World Information Seeking Scenarios.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023, 2023

QTSumm: Query-Focused Summarization over Tabular Data.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Crosslingual Generalization through Multitask Finetuning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

GersteinLab at MEDIQA-Chat 2023: Clinical Note Summarization from Doctor-Patient Conversations through Fine-tuning and In-context Learning.
Proceedings of the 5th Clinical Natural Language Processing Workshop, 2023

Aligning Factual Consistency for Clinical Studies Summarization through Reinforcement Learning.
Proceedings of the 5th Clinical Natural Language Processing Workshop, 2023

2022
FeTaQA: Free-form Table Question Answering.
Trans. Assoc. Comput. Linguistics, 2022

EHRKit: A Python Natural Language Processing Toolkit for Electronic Health Record Texts.
CoRR, 2022

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts.
CoRR, 2022

CONFIT: Toward Faithful Dialogue Summarization with Linguistically-Informed Contrastive Fine-tuning.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Surfer100: Generating Surveys From Web Resources, Wikipedia-style.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022


2021
CLICKER: A Computational LInguistics Classification Scheme for Educational Resources.
CoRR, 2021

Improving RNA Secondary Structure Design using Deep Reinforcement Learning.
CoRR, 2021

Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types.
CoRR, 2021

Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries.
CoRR, 2021

FeTaQA: Free-form Table Question Answering.
CoRR, 2021

DART: Open-Domain Structured Data Record to Text Generation.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

2020
FILM: A Fast, Interpretable, and Low-rank Metric Learning Approach for Sentence Matching.
CoRR, 2020

Multi-Granularity Modularized Network for Abstract Visual Reasoning.
CoRR, 2020

DART: Open-Domain Structured Data Record to Text Generation.
CoRR, 2020

CUHK at SemEval-2020 Task 4: CommonSense Explanation, Reasoning and Prediction with Multi-task Learning.
CoRR, 2020

CUHK at SemEval-2020 Task 4: CommonSense Explanation, Reasoning and Prediction with Multi-task Learning.
Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020

Categorizing Offensive Language in Social Networks: A Chinese Corpus, Systems and an Explanation Tool.
Proceedings of the Chinese Computational Linguistics - 19th China National Conference, CCL 2020, Hainan, China, October 30, 2020

2019
Improving Code Generation From Descriptive Text By Combining Deep Learning and Syntax Rules.
Proceedings of the 31st International Conference on Software Engineering and Knowledge Engineering, 2019

Knowledge-Aware Self-Attention Networks for Document Grounded Dialogue Generation.
Proceedings of the Knowledge Science, Engineering and Management, 2019


  Loading...