Yujiong Shen

According to our database¹, Yujiong Shen authored at least 16 papers between 2024 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening.

[BibT_eX]

[DOI]

CoRR, May, 2026

CL-bench Life: Can Language Models Learn from Real-Life Context?

[BibT_eX]

[DOI]

CoRR, April, 2026

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees.

[BibT_eX]

[DOI]

CoRR, March, 2026

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents.

[BibT_eX]

[DOI]

CoRR, February, 2026

CL-bench: A Benchmark for Context Learning.

[BibT_eX]

[DOI]

CoRR, February, 2026

Can Deep Research Agents Retrieve and Organize? Evaluating the Synthesis Gap with Expert Taxonomies.

[BibT_eX]

[DOI]

CoRR, January, 2026

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment.

[BibT_eX]

[DOI]

CoRR, January, 2026

LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

LLMEval-3: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training.

[BibT_eX]

[DOI]

CoRR, February, 2025

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.

[BibT_eX]

[DOI]

CoRR, 2024

TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Yujiong Shen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...