Samuel Cahyawijaya

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

High-Dimension Human Value Representation in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Subobject-level Image Tokenization.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

What Makes for Good Image Captions?

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models.

[BibT_eX]

[DOI]

Ayu Purwarianti

Proceedings of the 31st International Conference on Computational Linguistics, 2025

Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments.

[BibT_eX]

[DOI]

Patomporn Payoungkhamdee

Pume Tuchinda

Jinheon Baek

Can Udomcharoenchaikit

Potsawee Manakul

Ekapol Chuangsuwanich

Sarana Nutanong

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia.

[BibT_eX]

[DOI]

Mohammad Rifqi Farhansyah

Joel Ruben Antony Moniz

Tack Hwa Wong

Thant Thiri Maung

Frederikus Hudi

David Anugraha

Muhammad Ravi Shulthan Habibi

Muhammad Reza Qorib

Amit Agarwal

Joseph Marvin Imperial

Hitesh Laxmichand Patel

Vicky Feliren

Bahrul Ilmi Nasution

Manuel Antonio Rufino

Mohamed Fazli Mohamed Imam

Rian Adam Rajagede

Carlos Rafael Catalan

Priyaranjan Pattnayak

Salsabila Zahirah Pranida

Kevin Pratama

Yeshil Bangera

Adisai Na-Thalang

Patricia Nicole Monderin

Kanyakorn Veerakanjana

Piyalitt Ittichaiwong

Matthew Theodore Roque

Karissa Vincentio

Takdanai Kreangphet

Phakphum Artkaew

Kadek Hendrawan Palgunadi

Hanif Muhammad Zhafran

Fenal Ashokbhai Ilasariya

Haochen Li

John Amadeo Daniswara

Filbert Aurelian Tjiaranata

Eryawan Presma Yulianrifat

Can Udomcharoenchaikit

Fadil Risdian Ansori

Mahardika Krisna Ihsani

Isaiah Edri W. Flores

Lester James Validad Miranda

Ming Shan Hee

Ikhlasul Akmal Hanif

M. Alif Al Hakim

Muhammad Rizky Sya'ban

Kun Kerdthaisong

Fajri Koto

Tirana Noor Fatyanosa

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense.

[BibT_eX]

[DOI]

Elisa Gilbert

Hiroki Nomoto

CoRR, 2024

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines.

[BibT_eX]

[DOI]

Frederikus Hudi

Ubaidillah Ariq Prathama

Maria Angelica Riera Machin

Jan Wira Gotama Putra

Junho Myung

Lucky Susanto

Marina Zhukova

Michael Anugraha

Natasha Santosa

Stephanie Yulia Salim

Yi Zhou

Yinxuan Gui

David Ifeoluwa Adelani

CoRR, 2024

LLM for Everyone: Representing the Underrepresented in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

LLM Internal States Reveal Hallucination Risk Faced With a Query.

[BibT_eX]

[DOI]

CoRR, 2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages.

[BibT_eX]

[DOI]

Rahmad Mahendra

Muhammad Ravi Shulthan Habibi

Lester James V. Miranda

Joseph Marvin Imperial

Onno Pepijn Kampman

Joel Ruben Antony Moniz

Bin Wang

Chenxi Whitehouse

Muhammad Dehan Al Kautsar

Sonny Lazuardi Hermawan

Dan John Velasco

Willy Fitra Hendria

Yasmin Moslem

Noah Flynn

CoRR, 2024

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark.

[BibT_eX]

[DOI]

David Romero

Chenyang Lyu

David Ifeoluwa Adelani

Henok Biadglign Ademtew

Hernán Maina

Israel Abebe Azime

Jesús-Germán Ortiz-Barajas

Jay P. Gala

Jiahui Geng

Jinheon Baek

Jocelyn Dunstan

Laura Alonso Alemany

Kumaranage Ravindu Yasas Nagasinghe

Luciana Benotti

Luis Fernando D'Haro

Marcelo Viridiano

Marcos Estecha-Garitagoitia

Maria Camila Buitrago Cabrera

Mario Rodríguez-Cantelar

Mélanie Jouitteau

Mihail Mihaylov

Mohamed Fazli Mohamed Imam

Munkhjargal Gochoo

Munkh-Erdene Otgonbold

Tiago Timponi Torrent

Toqeer Ehsan

Vladimir Araujo

Yova Kementchedjhieva

CoRR, 2024

The Pyramid of Captions.

[BibT_eX]

[DOI]

CoRR, 2024

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark.

[BibT_eX]

[DOI]

David Romero

Chenyang Lyu

Jesús-Germán Ortiz-Barajas

Santiago Góngora

Aishik Mandal

Sukannya Purkayastha

Munkh-Erdene Otgonbold

Tiago Timponi Torrent

Frederico Belcavello

Marcelo Viridiano

Christian Salamea Palacios

Vladimir Araujo

Yova Kementchedjhieva

Mihail Mihaylov

Israel Abebe Azime

Henok Biadglign Ademtew

Bontu Fufa Balcha

Naome A. Etori

David Ifeoluwa Adelani

Rada Mihalcea

Atnafu Lambebo Tonja

Maria Camila Buitrago Cabrera

Gisela Vallejo

Marcos Estecha-Garitagoitia

Mario Rodríguez-Cantelar

Toqeer Ehsan

Rendi Chevi

Mohamed Fazli Mohamed Imam

Kumaranage Ravindu Yasas Nagasinghe

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

LLMs Are Few-Shot In-Context Low-Resource Language Learners.

[BibT_eX]

[DOI]

Pascale Fung

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Belief Revision: The Adaptability of Large Language Models Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages.

[BibT_eX]

[DOI]

Rahmad Mahendra

Muhammad Ravi Shulthan Habibi

Lester James V. Miranda

Joseph Marvin Imperial

Onno Kampman

Joel Ruben Antony Moniz

Bin Wang

Chenxi Whitehouse

Muhammad Dehan Al Kautsar

Sonny Lazuardi Hermawan

Dan John Velasco

Willy Fitra Hendria

Yasmin Moslem

Noah Flynn

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Re-Evaluating Evaluation for Multilingual Summarization.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language Generalization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages.

[BibT_eX]

[DOI]

Emmanuel Dave

Nuur Shadieq

Muhammad Ihza Mahendra

Dea Annisayanti Putri

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages.

[BibT_eX]

[DOI]

Muhammad Dehan Al Kautsar

CoRR, 2023

IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems.

[BibT_eX]

[DOI]

Rahmah Khoirussyifa' Nurdini

Ayu Purwarianti

CoRR, 2023

InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems.

[BibT_eX]

[DOI]

CoRR, 2023

Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Survey of Social Bias in Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

GlobalBench: A Benchmark for Global Progress in Natural Language Processing.

[BibT_eX]

[DOI]

Antonios Anastasopoulos

Graham Neubig

CoRR, 2023

Multilingual Large Language Models Are Not (Yet) Code-Switchers.

[BibT_eX]

[DOI]

CoRR, 2023

Instruct-Align: Teaching Novel Languages with to LLMs through Alignment-based Cross-Lingual Instruction.

[BibT_eX]

[DOI]

CoRR, 2023

Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages.

[BibT_eX]

[DOI]

Long Phan

Yin Lin Tan

CoRR, 2023

Biomedical Image Reconstruction: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

Cross-Lingual Cross-Age Adaptation for Low-Resource Elderly Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages.

[BibT_eX]

[DOI]

Jhonson Lee

Nuur Shadieq

Tjeng Wawan Cenggoro

Hanung Wahyuning Linuwih

Bryan Wilie

Galih Pradipta Muridan

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity.

[BibT_eX]

[DOI]

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

The Obscure Limitation of Modular Multilingual Language Models.

[BibT_eX]

[DOI]

Ayu Purwarianti

Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

Multilingual Large Language Models Are Not (Yet) Code-Switchers.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

GlobalBench: A Benchmark for Global Progress in Natural Language Processing.

[BibT_eX]

[DOI]

Antonios Anastasopoulos

Graham Neubig

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.

[BibT_eX]

[DOI]

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue.

[BibT_eX]

[DOI]

Muhammad Satrio Wicaksono

Pascale Fung

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: EACL 2023, 2023

Multi-lingual and Multi-cultural Figurative Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

NusaCrowd: Open Source Initiative for Indonesian NLP Resources.

[BibT_eX]

[DOI]

Arie Ardiyanti Suryani

Rifki Afina Putri

Dan Su

Keith Stevens

Made Nindyatama Nityasya

Ichwanul Muslim Karo Karo

Cuk Tho

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

NusaCrowd: Open Source Initiative for Indonesian NLP Resources.

[BibT_eX]

[DOI]

Muhammad Satrio Wicaksono

Ika Alfina

Arie Ardiyanti Suryani

Rifki Afina Putri

Dan Su

Keith Stevens

Made Nindyatama Nityasya

Ichwanul Muslim Karo Karo

Tirana Noor Fatyanosa

CoRR, 2022

NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages.

[BibT_eX]

[DOI]

CoRR, 2022

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands.

[BibT_eX]

[DOI]

CoRR, 2022

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing.

[BibT_eX]

[DOI]

CoRR, 2022

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code.

[BibT_eX]

[DOI]

Alexandros Papangelis

Aman Madaan

Angelina McMillan-Major

Khyathi Raghavi Chandu

Laura Perez-Beltrachini

Leonardo F. R. Ribeiro

Pawan Sasanka Ammanamanchi

CoRR, 2022

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.

[BibT_eX]

[DOI]

CoRR, 2022

VScript: Controllable Script Generation with Audio-Visual Presentation.

[BibT_eX]

[DOI]

CoRR, 2022

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition.

[BibT_eX]

[DOI]

Cheuk Tung Shadow Yiu

CoRR, 2022

Clozer": " Adaptable Data Augmentation for Cloze-style Reading Comprehension.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Representation Learning for NLP, 2022

BigBio: A Framework for Data-Centric Biomedical Natural Language Processing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset.

[BibT_eX]

[DOI]

Cheuk Tung Shadow Yiu

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Every picture tells a story: Image-grounded controllable stylistic story generation.

[BibT_eX]

[DOI]

Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, 2022

VScript: Controllable Script Generation with Visual Presentation.

[BibT_eX]

[DOI]

Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022

SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study.

[BibT_eX]

[DOI]

Proceedings of the 21st Workshop on Biomedical Language Processing, 2022

Integrating Question Rewrites in Conversational Question Answering: A Reinforcement Learning Approach.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 2022

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis, 2022

Can Question Rewriting Help Conversational Question Answering?

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on Insights from Negative Results in NLP, 2022

Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters.

[BibT_eX]

[DOI]

Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022

2021

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation.

[BibT_eX]

[DOI]

CoRR, 2021

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation.

[BibT_eX]

[DOI]

Jascha Sohl-Dickstein

Marco Antonio Sobrevilla Cabezudo

Paulo Henrique Santos Vasconcellos

William Soto Martinez

CoRR, 2021

Greenformer: Factorization Toolkit for Efficient Deep Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2021

Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation.

[BibT_eX]

[DOI]