Asaf Yehudai

According to our database1, Asaf Yehudai authored at least 30 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks.
CoRR, May, 2026

Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents.
CoRR, May, 2026

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
CoRR, May, 2026

Growing Pains: Extensible and Efficient LLM Benchmarking Via Fixed Parameter Calibration.
CoRR, April, 2026

CUBE: A Standard for Unifying Agent Benchmarks.
CoRR, March, 2026

General Agent Evaluation.
CoRR, February, 2026

Will it Merge? On The Causes of Model Mergeability.
CoRR, January, 2026

Mediocrity is the key for LLM as a Judge Anchor Selection.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization.
CoRR, October, 2025

Survey on Evaluation of LLM-based Agents.
CoRR, March, 2025

WildIFEval: Instruction Following in the Wild.
CoRR, March, 2025

The Mighty ToRR: A Benchmark for Table Reasoning and Robustness.
CoRR, February, 2025

Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

JuStRank: Benchmarking LLM Judges for System Ranking.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models.
CoRR, 2024

Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation.
CoRR, 2024

When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes.
CoRR, 2024

Genie: Achieving Human Parity in Content-Grounded Datasets Generation.
CoRR, 2024

FastFit: Fast and Effective Few-Shot Text Classification with a Multitude of Classes.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, 2024

Achieving Human Parity in Content-Grounded Datasets Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

More Bang for your Context: Virtual Documents for Question Answering over Long Documents.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns.
Proceedings of the 46th Annual Meeting of the Cognitive Science Society, 2024

A Grounded Preference Model for LLM Alignment.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
QAID: Question Answering Inspired Few-shot Intent Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Evaluating and Improving the Coreference Capabilities of Machine Translation Models.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

2022
Reinforcement Learning with Large Action Spaces for Neural Machine Translation.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

Conversational Search with Mixed-Initiative - Asking Good Clarification Questions backed-up by Passage Retrieval.
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022

2021
Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021


  Loading...