Eugene Yang

Orcid: 0000-0002-0051-1535

Affiliations:
  • Johns Hopkins University, Human Language Technology Center of Excellence, Baltimore, MD, USA
  • Georgetown University, IR Lab, Washington, DC, USA (PhD 2021)


According to our database1, Eugene Yang authored at least 87 papers between 2017 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Search for Coverage: Learning Coverage-Aware Retrieval with Augmented Sub-Question Answerability.
CoRR, May, 2026

DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation.
CoRR, May, 2026

CoverageBench: Evaluating Information Coverage across Tasks and Domains.
CoRR, March, 2026

Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage.
CoRR, March, 2026

Overview of the TREC 2025 RAGTIME Track.
CoRR, February, 2026

NeuCLIRTech: Chinese Monolingual and Cross-Language Information Retrieval Evaluation in a Challenging Domain.
CoRR, February, 2026

WSDM CUP 2026: Multilingual Retrieval.
Proceedings of the Nineteenth ACM International Conference on Web Search and Data Mining, 2026

RoutIR: Fast Serving of Retrieval Pipelines for Retrieval-Augmented Generation.
Proceedings of the Advances in Information Retrieval, 2026

Does Reasoning Make Search More Fair? Comparing Fairness in Reasoning and Non-reasoning Rerankers.
Proceedings of the Advances in Information Retrieval, 2026

Investigating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries.
Proceedings of the Advances in Information Retrieval, 2026

LANCER: LLM Reranking for Nugget Coverage.
Proceedings of the Advances in Information Retrieval, 2026

Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?
Proceedings of the Advances in Information Retrieval, 2026

Incorporating Q&A Nuggets Into Retrieval-Augmented Generation.
Proceedings of the Advances in Information Retrieval, 2026

Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction.
Proceedings of the Advances in Information Retrieval, 2026

2025
NeuCLIRBench: A Modern Evaluation Collection for Monolingual, Cross-Language, and Multilingual Information Retrieval.
CoRR, November, 2025

Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation.
CoRR, October, 2025

Augmenting Researchy Questions with Sub-question Judgments.
CoRR, October, 2025

Evaluating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries.
CoRR, October, 2025

Topic-Specific Classifiers are Better Relevance Judges than Prompted LLMs.
CoRR, October, 2025

Milco: Learned Sparse Retrieval Across Languages via a Multilingual Connector.
CoRR, October, 2025

Auto-ARGUE: LLM-Based Report Generation Evaluation.
CoRR, September, 2025

Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG.
CoRR, September, 2025

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning.
CoRR, September, 2025

HLTCOE at LiveRAG: GPT-Researcher using ColBERT retrieval.
CoRR, June, 2025

Rank-K: Test-Time Reasoning for Listwise Reranking.
CoRR, May, 2025

WikiVideo: Article Generation from Multiple Videos.
CoRR, April, 2025

Rank1: Test-Time Compute for Reranking in Information Retrieval.
CoRR, February, 2025

Neural Lexical Search with Learned Sparse Retrieval.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Nugget-based Annotation Protocol and Tool For Evaluating Long-form Retrieval-Augmented Generation.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

System Comparison Using Automated Generation of Relevance Judgements in Multiple Languages.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

MMMORRF: Multimodal Multilingual MOdularized Reciprocal Rank Fusion.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Variations in Relevance Judgments and the Shelf Life of Test Collections.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Generate-Distill: Training Cross-Language IR Models with Synthetically-Generated Data.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

A Reproducibility Study of LLM Setwise Reranker with Heapsort.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

CLERC: A Dataset for U. S. Legal Case Retrieval and Retrieval-Augmented Analysis Generation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

MURR: Model Updating with Regularized Replay for Searching a Document Stream.
Proceedings of the Advances in Information Retrieval, 2025

Eval4RAG: Workshop on Evaluation of Retrieval-Augmented Generation Systems.
Proceedings of the Advances in Information Retrieval, 2025

mFollowIR: A Multilingual Benchmark for Instruction Following in Retrieval.
Proceedings of the Advances in Information Retrieval, 2025

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Report on the Collab-a-Thon at ECIR 2024.
SIGIR Forum, June, 2024

Report on the Search Futures Workshop at ECIR 2024.
SIGIR Forum, June, 2024

MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval.
CoRR, 2024

CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation.
CoRR, 2024

Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval.
CoRR, 2024

HLTCOE at TREC 2024 NeuCLIR Track.
Proceedings of the Thirty-Third Text REtrieval Conference, 2024

Overview of the TREC 2024 NeuCLIR Track.
Proceedings of the Thirty-Third Text REtrieval Conference, 2024

Distillation for Multilingual Information Retrieval.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Language Fairness in Multilingual Information Retrieval.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Contextualization with SPLADE for High Recall Retrieval.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

On the Evaluation of Machine-Generated Reports.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

High Recall Retrieval Via Technology-Assisted Review.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation.
Proceedings of the Advances in Information Retrieval, 2024

Beyond the Bar: Generative AI as a Transformative Component in Legal Document Review.
Proceedings of the IEEE International Conference on Big Data, 2024

2023
Synthetic Cross-language Information Retrieval Training Data.
CoRR, 2023

HLTCOE at TREC 2023 NeuCLIR Track.
Proceedings of the Thirty-Second Text REtrieval Conference Proceedings (TREC 2023), 2023

Overview of the TREC 2023 NeuCLIR Track.
Proceedings of the Thirty-Second Text REtrieval Conference Proceedings (TREC 2023), 2023

Neural Methods for Cross-Language Information Retrieval.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

BLADE: Combining Vocabulary Pruning and Intermediate Pretraining for Scaleable Neural CLIR.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

HC3: A Suite of Test Collections for CLIR Evaluation over Informal Text.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Extending Translate-Train for ColBERT-X to African Language CLIR.
Proceedings of the Working Notes of FIRE 2023, 2023

Neural Approaches to Multilingual Information Retrieval.
Proceedings of the Advances in Information Retrieval, 2023

2022
Parameter-efficient Zero-shot Transfer for Cross-Language Dense Retrieval with Adapters.
CoRR, 2022

Multilingual ColBERT-X.
CoRR, 2022

HLTCOE at TREC 2022 NeuCLIR Track.
Proceedings of the Thirty-First Text REtrieval Conference, 2022

Overview of the TREC 2022 NeuCLIR Track.
Proceedings of the Thirty-First Text REtrieval Conference, 2022

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

TARexp: A Python Framework for Technology-Assisted Review Experiments.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Learning to Enrich Query Representation with Pseudo-Relevance Feedback for Cross-lingual Retrieval.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

ECIR 2022 Tutorial: Technology-Assisted Review for High Recall Retrieval.
Proceedings of the Advances in Information Retrieval, 2022

Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review.
Proceedings of the Advances in Information Retrieval, 2022

Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models.
Proceedings of the Advances in Information Retrieval, 2022

HC4: A New Suite of Test Collections for Ad Hoc CLIR.
Proceedings of the Advances in Information Retrieval, 2022

Patapasco: A Python Framework for Cross-Language Information Retrieval Experiments.
Proceedings of the Advances in Information Retrieval, 2022

Learning a Sparse Representation Model for Neural CLIR.
Proceedings of the Third International Conference on Design of Experimental Search & Information REtrieval Systems, 2022

2021
ToxCCIn: Toxic Content Classification with Interpretability.
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, 2021

Heuristic stopping rules for technology-assisted review.
Proceedings of the DocEng '21: ACM Symposium on Document Engineering 2021, 2021

On minimizing cost in legal document review workflows.
Proceedings of the DocEng '21: ACM Symposium on Document Engineering 2021, 2021

TAR on Social Media: A Framework for Online Content Moderation.
Proceedings of the Second International Conference on Design of Experimental Search & Information REtrieval Systems, 2021

Certifying One-Phase Technology-Assisted Reviews.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020
GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection.
Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020

2019
I/O-Efficient Algorithms for Topological Sort and Related Problems.
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, 2019

Text Retrieval Priors for Bayesian Logistic Regression.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

A Regularization Approach to Combining Keywords and Training Data in Technology-Assisted Review.
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, 2019

2018
Retrieval and Richness when Querying by Document.
Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, 2018

2017
Effectiveness results for popular e-discovery algorithms.
Proceedings of the 16th edition of the International Conference on Artificial Intelligence and Law, 2017


  Loading...