Mohammad Beigi

According to our database1, Mohammad Beigi authored at least 9 papers between 2024 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
IR<sup>3</sup>: Contrastive Inverse Reinforcement Learning for Interpretable Detection and Mitigation of Reward Hacking.
CoRR, February, 2026

Adversarial Reward Auditing for Active Detection and Mitigation of Reward Hacking.
CoRR, February, 2026

Survey of uncertainty estimation in LLMs - Sources, methods, applications, and challenges.
Inf. Fusion, 2026

2025
A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models.
CoRR, February, 2025

Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models.
CoRR, 2024

InternalInspector I<sup>2</sup>: Robust Confidence Estimation in LLMs through Internal States.
CoRR, 2024

InternalInspector I²: Robust Confidence Estimation in LLMs through Internal States.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024


  Loading...