Yujiong Shen
According to our database1,
Yujiong Shen authored at least 16 papers
between 2024 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2026
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening.
CoRR, May, 2026
JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees.
CoRR, March, 2026
CoRR, February, 2026
Can Deep Research Agents Retrieve and Organize? Evaluating the Synthesis Gap with Expert Taxonomies.
CoRR, January, 2026
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment.
CoRR, January, 2026
LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026
2025
LLMEval-3: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models.
CoRR, August, 2025
Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training.
CoRR, February, 2025
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
2024
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
CoRR, 2024
TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024