Yujiong Shen

According to our database1, Yujiong Shen authored at least 16 papers between 2024 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening.
CoRR, May, 2026

CL-bench Life: Can Language Models Learn from Real-Life Context?
CoRR, April, 2026

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees.
CoRR, March, 2026

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents.
CoRR, February, 2026

CL-bench: A Benchmark for Context Learning.
CoRR, February, 2026

Can Deep Research Agents Retrieve and Organize? Evaluating the Synthesis Gap with Expert Taxonomies.
CoRR, January, 2026

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment.
CoRR, January, 2026

LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025
LLMEval-3: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models.
CoRR, August, 2025

Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training.
CoRR, February, 2025

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
CoRR, 2024

TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024


  Loading...