Amir Hossein Kargaran

Orcid: 0000-0001-6253-1315

According to our database1, Amir Hossein Kargaran authored at least 18 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
FineWeb2: One Pipeline to Scale Them All - Adapting Pre-Training Data Processing to Every Language.
CoRR, June, 2025

Tracing Multilingual Factual Knowledge Acquisition in Pretraining.
CoRR, May, 2025

On Relation-Specific Neurons in Large Language Models.
CoRR, February, 2025

How Transliterations Improve Crosslingual Alignment.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

How Programming Concepts and Neurons Are Shared in Code Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

GIRT-Model: Automated Generation of Issue Report Templates.
Proceedings of the 21st IEEE/ACM International Conference on Mining Software Repositories, 2024

GlotScript: A Resource and Tool for Low Resource Writing System Identification.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

MaskLID: Code-Switching Language Identification through Iterative Masking.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

2023
MenuCraft: Interactive Menu System Design with Large Language Models.
CoRR, 2023

GIRT-Data: Sampling GitHub Issue Report Templates.
Proceedings of the 20th IEEE/ACM International Conference on Mining Software Repositories, 2023

GlotLID: Language Identification for Low-Resource Languages.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging.
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022

2021
Wide-AdGraph: Detecting Ad Trackers with a Wide Dependency Chain Graph.
Proceedings of the WebSci '21: 13th ACM Web Science Conference 2021, 2021

2020
On Detecting Hidden Third-Party Web Trackers with a Wide Dependency Chain Graph: A Representation Learning Approach.
CoRR, 2020

Analytical Derivation and Comparison of Alarm Similarity Analysis Methods.
CoRR, 2020


  Loading...