Jan Christian Blaise Cruz

Orcid: 0000-0002-2676-7790

According to our database1, Jan Christian Blaise Cruz authored at least 39 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
LLM Olympiad: Why Model Evaluation Needs a Sealed Exam.
CoRR, March, 2026

Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming.
CoRR, January, 2026

Multilinguality as Sense Adaptation.
CoRR, January, 2026

2025
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability.
CoRR, June, 2025

CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation.
CoRR, May, 2025

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, March, 2025

Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation.
CoRR, January, 2025


Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Senses.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025


MoMentS: A Comprehensive Multimodal Benchmark for Theory of Mind.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

FilBench: Can LLMs Understand and Generate Filipino?
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense.
CoRR, 2024

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines.
CoRR, 2024

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages.
CoRR, 2024

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark.
CoRR, 2024

Samsung R&D Institute Philippines @ WMT 2024 Low-resource Languages of Spain Shared Task.
Proceedings of the Ninth Conference on Machine Translation, 2024

Samsung R&D Institute Philippines @ WMT 2024 Indic MT Task.
Proceedings of the Ninth Conference on Machine Translation, 2024

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024


2023
Multilingual Large Language Models Are Not (Yet) Code-Switchers.
CoRR, 2023

Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages.
CoRR, 2023

Samsung R&D Institute Philippines at WMT 2023.
Proceedings of the Eighth Conference on Machine Translation, 2023

Multilingual Large Language Models Are Not (Yet) Code-Switchers.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings.
CoRR, 2022

Using Synthetic Data for Conversational Response Generation in Low-resource Settings.
CoRR, 2022

Samsung Research Philippines - Datasaur AI's Submission for the WMT22 Large Scale Multilingual Translation Task.
Proceedings of the Seventh Conference on Machine Translation, 2022

Improving Large-scale Language Models and Resources for Filipino.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Using Synthetic Data to Train a Conversational Response Generation Model in Low Resource Settings.
Proceedings of the International Conference on Asian Language Processing, 2022

2021
Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21.
Proceedings of the Sixth Conference on Machine Translation, 2021

Simplifying Paragraph-Level Question Generation via Transformer Language Models.
Proceedings of the PRICAI 2021: Trends in Artificial Intelligence, 2021

Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets.
Proceedings of the PRICAI 2021: Trends in Artificial Intelligence, 2021

2020
Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation.
CoRR, 2020

Establishing Baselines for Text Classification in Low-Resource Languages.
CoRR, 2020

Transformer-based End-to-End Question Generation.
CoRR, 2020

Localization of Fake News Detection via Multitask Transfer Learning.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

2019
Evaluating Language Model Finetuning Techniques for Low-resource Languages.
CoRR, 2019


  Loading...