Anoop Kunchukuttan

Orcid: 0009-0007-3143-9875

According to our database1, Anoop Kunchukuttan authored at least 79 papers between 2012 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation.
CoRR, 2024

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages.
CoRR, 2024

Airavata: Introducing Hindi Instruction-tuned LLM.
CoRR, 2024

RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization.
CoRR, 2024

CharSpan: Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

A Comprehensive Analysis of Adapter Efficiency.
Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD), 2024

2023
IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages.
CoRR, 2023

Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages.
CoRR, 2023

In-context Example Selection for Machine Translation Using Multiple Features.
CoRR, 2023

Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages.
CoRR, 2023

Evaluating Inter-Bilingual Semantic Parsing for Indian Languages.
CoRR, 2023

Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages.
Proceedings of the IEEE International Conference on Acoustics, 2023

DecoMT: Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Bhasa-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian Languages.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages.
Trans. Assoc. Comput. Linguistics, 2022

IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages.
CoRR, 2022

Aksharantar: Towards building open transliteration tools for the next billion users.
CoRR, 2022

IndicNLG Suite: Multilingual Datasets for Diverse NLG Tasks in Indic Languages.
CoRR, 2022

Bilingual Tabular Inference: A Case Study on Indic Languages.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

IndicXNLI: Evaluating Multilingual Inference for Indian Languages.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Overview of the 9th Workshop on Asian Translation.
Proceedings of the 9th Workshop on Asian Translation, 2022

IndicBART: A Pre-trained Model for Indic Natural Language Generation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Towards Building ASR Systems for the Next Billion Users.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
A Survey of Multilingual Neural Machine Translation.
ACM Comput. Surv., 2021

An Empirical Investigation of Multi-bridge Multilingual NMT models.
CoRR, 2021

IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages.
CoRR, 2021

A Primer on Pretrained Multilingual Language Models.
CoRR, 2021

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages.
CoRR, 2021

A Large-scale Evaluation of Neural Machine Transliteration for Indic Languages.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021


Itihasa: A large-scale corpus for Sanskrit to English translation.
Proceedings of the 8th Workshop on Asian Translation, 2021

2020
AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages.
CoRR, 2020

Utilizing Language Relatedness to improve Machine Translation: A Case Study on Languages of the Indian Subcontinent.
CoRR, 2020

A Comprehensive Survey of Multilingual Neural Machine Translation.
CoRR, 2020

Contact Relatedness can help improve multilingual NMT: Microsoft STCI-MT @ WMT20.
Proceedings of the Fifth Conference on Machine Translation, 2020

Learning Geometric Word Meta-Embeddings.
Proceedings of the 5th Workshop on Representation Learning for NLP, 2020

iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Multilingual Neural Machine Translation.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Overview of the 7th Workshop on Asian Translation.
Proceedings of the 7th Workshop on Asian Translation, 2020

2019
Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach.
Trans. Assoc. Comput. Linguistics, 2019

Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Overview of the 6th Workshop on Asian Translation.
Proceedings of the 6th Workshop on Asian Translation, 2019

2018
Leveraging Orthographic Similarity for Multilingual Neural Transliteration.
Trans. Assoc. Comput. Linguistics, 2018

McTorch, a manifold optimization library for deep learning.
CoRR, 2018

The IIT Bombay English-Hindi Parallel Corpus.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Overview of the 5th Workshop on Asian Translation.
Proceedings of the 32nd Pacific Asia Conference on Language, 2018

NICT's Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers.
Proceedings of the 32nd Pacific Asia Conference on Language, 2018

Multilingual Indian Language Translation System at WAT 2018: Many-to-one Phrase-based SMT.
Proceedings of the 32nd Pacific Asia Conference on Language, 2018

Judicious Selection of Training Data in Assisting Language for Multilingual Neural NER.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Utilizing Lexical Similarity for pivot translation involving resource-poor, related languages.
CoRR, 2017

Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Learning variable length units for SMT between related languages via Byte Pair Encoding.
Proceedings of the First Workshop on Subword and Character Level Models in NLP, 2017

Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation.
Proceedings of the 4th Workshop on Asian Translation, 2017

2016
Faster Decoding for Subword Level Phrase-based SMT between Related Languages.
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

Statistical Machine Translation between Related Languages.
Proceedings of the Tutorial Abstracts, 2016

Orthographic Syllable as basic unit for SMT between Related Languages.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Substring-based unsupervised transliteration with phonetic and contextual knowledge.
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2016

IIT Bombay's English-Indonesian submission at WAT: Integrating Neural Language Models with SMT.
Proceedings of the 3rd Workshop on Asian Translation, 2016

2015
Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Augmenting Pivot based SMT with word segmentation.
Proceedings of the 12th International Conference on Natural Language Processing, 2015

Investigating the potential of post-ordering SMT output to improve translation quality.
Proceedings of the 12th International Conference on Natural Language Processing, 2015

Addressing Class Imbalance in Grammatical Error Detection with Evaluation Metric Optimization.
Proceedings of the 12th International Conference on Natural Language Processing, 2015

Data representation methods and use of mined corpora for Indian language transliteration.
Proceedings of the Fifth Named Entity Workshop, 2015

2014
The IIT Bombay Hindi-English Translation System at WMT 2014.
Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

Shata-Anuvadak: Tackling Multiway Translation of Indian Languages.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Supertag Based Pre-ordering in Machine Translation.
Proceedings of the 11th International Conference on Natural Language Processing, 2014

Tuning a Grammar Correction System for Increased Precision.
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, 2014

2013
IITB System for CoNLL 2013 Shared Task: A Hybrid Approach to Grammatical Error Correction.
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, 2013

TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
Experiences in Resource Generation for Machine Translation through Crowdsourcing.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Partially modelling word reordering as a sequence labelling problem.
Proceedings of the Workshop on Reordering for Statistical Machine Translation@COLING 2012, 2012


  Loading...