Anoop Kunchukuttan

V. Rudra Murthy

Thanmay Jayakumar

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs.

[BibT_eX]

[DOI]

Sumanth Doddapaneni

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Pralekha: An Indic Document Alignment Evaluation Benchmark.

[BibT_eX]

[DOI]

Sanjay Suryanarayanan

Haiyue Song

CoRR, 2024

BhasaAnuvaad: A Speech Translation Dataset for 13 Indian Languages.

[BibT_eX]

[DOI]

CoRR, 2024

An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages.

[BibT_eX]

[DOI]

Priyam Mehta

Ananth Sankar

Umashankar Kumaravelan

CoRR, 2024

Airavata: Introducing Hindi Instruction-tuned LLM.

[BibT_eX]

[DOI]

CoRR, 2024

RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization.

[BibT_eX]

[DOI]

CoRR, 2024

Findings of WMT 2024's MultiIndic22MT Shared Task for Machine Translation of 22 Indian Languages.

[BibT_eX]

[DOI]

Maunendra Sankar Desarkar

Proceedings of the Ninth Conference on Machine Translation, 2024

CharSpan: Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages.

[BibT_eX]

[DOI]

Kaushal Maurya

Rahul Kejriwal

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

A Comprehensive Analysis of Adapter Efficiency.

[BibT_eX]

[DOI]

Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD), 2024

Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2024

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages.

[BibT_eX]

[DOI]

Priyam Mehta

Ananth Sankar

Umashankar Kumaravelan

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models via Romanization.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages.

[BibT_eX]

[DOI]

Yash Madhani

Maunendra Sankar Desarkar

CoRR, 2023

In-context Example Selection for Machine Translation Using Multiple Features.

[BibT_eX]

[DOI]

CoRR, 2023

Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages.

[BibT_eX]

[DOI]

Kaushal Kumar Maurya

Rahul Kejriwal

CoRR, 2023

Evaluating Inter-Bilingual Semantic Parsing for Indian Languages.

[BibT_eX]

[DOI]

Divyanshu Aggarwal

Vivek Gupta

CoRR, 2023

Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages.

[BibT_eX]

[DOI]

Kaushal Santosh Bhogale

Proceedings of the IEEE International Conference on Acoustics, 2023

DecoMT: Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Bhasa-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages.

[BibT_eX]

[DOI]

Yash Madhani

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian Languages.

[BibT_eX]

[DOI]

Tahir Javed

Kaushal Santosh Bhogale

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages.

[BibT_eX]

[DOI]

Mitesh Shantadevi Khapra

Trans. Assoc. Comput. Linguistics, 2022

IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages.

[BibT_eX]

[DOI]

CoRR, 2022

Aksharantar: Towards building open transliteration tools for the next billion users.

[BibT_eX]

[DOI]

CoRR, 2022

IndicNLG Suite: Multilingual Datasets for Diverse NLG Tasks in Indic Languages.

[BibT_eX]

[DOI]

CoRR, 2022

Bilingual Tabular Inference: A Case Study on Indic Languages.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

IndicXNLI: Evaluating Multilingual Inference for Indian Languages.

[BibT_eX]

[DOI]

Divyanshu Aggarwal

Vivek Gupta

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Overview of the 9th Workshop on Asian Translation.

[BibT_eX]

[DOI]

Proceedings of the 9th Workshop on Asian Translation, 2022

IndicBART: A Pre-trained Model for Indic Natural Language Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Towards Building ASR Systems for the Next Billion Users.

[BibT_eX]

[DOI]

Tahir Javed

Sumanth Doddapaneni

Abhigyan Raman

Kaushal Santosh Bhogale

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

A Survey of Multilingual Neural Machine Translation.

[BibT_eX]

[DOI]

Chenhui Chu

ACM Comput. Surv., 2021

An Empirical Investigation of Multi-bridge Multilingual NMT models.

[BibT_eX]

[DOI]

CoRR, 2021

IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages.

[BibT_eX]

[DOI]

CoRR, 2021

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages.

[BibT_eX]

[DOI]

Mitesh Shantadevi Khapra

CoRR, 2021

A Large-scale Evaluation of Neural Machine Transliteration for Indic Languages.

[BibT_eX]

[DOI]

Siddharth Jain

Rahul Kejriwal

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Overview of the 8th Workshop on Asian Translation.

[BibT_eX]

[DOI]

Proceedings of the 8th Workshop on Asian Translation, 2021

Itihasa: A large-scale corpus for Sanskrit to English translation.

[BibT_eX]

[DOI]

Proceedings of the 8th Workshop on Asian Translation, 2021

2020

AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages.

[BibT_eX]

[DOI]

CoRR, 2020

Utilizing Language Relatedness to improve Machine Translation: A Case Study on Languages of the Indian Subcontinent.

[BibT_eX]

[DOI]

CoRR, 2020

A Comprehensive Survey of Multilingual Neural Machine Translation.

[BibT_eX]

[DOI]

Chenhui Chu

CoRR, 2020

Contact Relatedness can help improve multilingual NMT: Microsoft STCI-MT @ WMT20.

[BibT_eX]

[DOI]

Proceedings of the Fifth Conference on Machine Translation, 2020

Learning Geometric Word Meta-Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Representation Learning for NLP, 2020

iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Multilingual Neural Machine Translation.

[BibT_eX]

[DOI]

Chenhui Chu

Proceedings of the 28th International Conference on Computational Linguistics, 2020

Overview of the 7th Workshop on Asian Translation.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Asian Translation, 2020

2019

Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2019

Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages.

[BibT_eX]

[DOI]

V. Rudra Murthy

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Overview of the 6th Workshop on Asian Translation.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Asian Translation, 2019

2018

Leveraging Orthographic Similarity for Multilingual Neural Transliteration.

[BibT_eX]

[DOI]

Gurneet Singh

Trans. Assoc. Comput. Linguistics, 2018

McTorch, a manifold optimization library for deep learning.

[BibT_eX]

[DOI]

CoRR, 2018

The IIT Bombay English-Hindi Parallel Corpus.

[BibT_eX]

[DOI]

Pratik Mehta

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Overview of the 5th Workshop on Asian Translation.

[BibT_eX]

[DOI]

Proceedings of the 32nd Pacific Asia Conference on Language, 2018

NICT's Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers.

[BibT_eX]

[DOI]

Proceedings of the 32nd Pacific Asia Conference on Language, 2018

Multilingual Indian Language Translation System at WAT 2018: Many-to-one Phrase-based SMT.

[BibT_eX]

[DOI]

Tamali Banerjee

Pushpak Bhattacharya

Proceedings of the 32nd Pacific Asia Conference on Language, 2018

Judicious Selection of Training Data in Assisting Language for Multilingual Neural NER.

[BibT_eX]

[DOI]

V. Rudra Murthy

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017

Utilizing Lexical Similarity for pivot translation involving resource-poor, related languages.

[BibT_eX]

[DOI]

Maulik Shah

Pradyot Prakash

CoRR, 2017

Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT.

[BibT_eX]

[DOI]

Maulik Shah

Pradyot Prakash

Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Learning variable length units for SMT between related languages via Byte Pair Encoding.

[BibT_eX]

[DOI]

Proceedings of the First Workshop on Subword and Character Level Models in NLP, 2017

Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation.

[BibT_eX]

[DOI]

Sandhya Singh

Ritesh Panjwani

Proceedings of the 4th Workshop on Asian Translation, 2017

2016

Faster Decoding for Subword Level Phrase-based SMT between Related Languages.

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

Statistical Machine Translation between Related Languages.

[BibT_eX]

[DOI]

Proceedings of the Tutorial Abstracts, 2016

Orthographic Syllable as basic unit for SMT between Related Languages.

[BibT_eX]

[DOI]

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Substring-based unsupervised transliteration with phonetic and contextual knowledge.

[BibT_eX]

[DOI]

Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2016

IIT Bombay's English-Indonesian submission at WAT: Integrating Neural Language Models with SMT.

[BibT_eX]

[DOI]

Sandhya Singh

Proceedings of the 3rd Workshop on Asian Translation, 2016

2015

Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent.

[BibT_eX]

[DOI]

Ratish Puduppully

Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Augmenting Pivot based SMT with word segmentation.

[BibT_eX]

[DOI]

Rohit More

Proceedings of the 12th International Conference on Natural Language Processing, 2015

Investigating the potential of post-ordering SMT output to improve translation quality.

[BibT_eX]

[DOI]

Pratik Mehta

Proceedings of the 12th International Conference on Natural Language Processing, 2015

Addressing Class Imbalance in Grammatical Error Detection with Evaluation Metric Optimization.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Natural Language Processing, 2015

Data representation methods and use of mined corpora for Indian language transliteration.

[BibT_eX]

[DOI]

Proceedings of the Fifth Named Entity Workshop, 2015

2014

The IIT Bombay Hindi-English Translation System at WMT 2014.

[BibT_eX]

[DOI]

Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

Shata-Anuvadak: Tackling Multiway Translation of Indian Languages.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control.

[BibT_eX]

[DOI]

Ananthakrishnan Ramanathan

Karthik Visweswariah