Jan Christian Blaise Cruz
Orcid: 0000-0002-2676-7790
According to our database1,
Jan Christian Blaise Cruz
authored at least 34 papers
between 2019 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability.
CoRR, June, 2025
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia.
CoRR, March, 2025
Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation.
CoRR, January, 2025
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Senses.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025
Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation.
Proceedings of the 31st International Conference on Computational Linguistics, 2025
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
2024
Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense.
CoRR, 2024
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines.
CoRR, 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages.
CoRR, 2024
CoRR, 2024
Samsung R&D Institute Philippines @ WMT 2024 Low-resource Languages of Spain Shared Task.
Proceedings of the Ninth Conference on Machine Translation, 2024
Proceedings of the Ninth Conference on Machine Translation, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
2023
Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages.
CoRR, 2023
Proceedings of the Eighth Conference on Machine Translation, 2023
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
2022
Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings.
CoRR, 2022
Using Synthetic Data for Conversational Response Generation in Low-resource Settings.
CoRR, 2022
Samsung Research Philippines - Datasaur AI's Submission for the WMT22 Large Scale Multilingual Translation Task.
Proceedings of the Seventh Conference on Machine Translation, 2022
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Using Synthetic Data to Train a Conversational Response Generation Model in Low Resource Settings.
Proceedings of the International Conference on Asian Language Processing, 2022
2021
Proceedings of the Sixth Conference on Machine Translation, 2021
Proceedings of the PRICAI 2021: Trends in Artificial Intelligence, 2021
Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets.
Proceedings of the PRICAI 2021: Trends in Artificial Intelligence, 2021
2020
Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation.
CoRR, 2020
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
2019
CoRR, 2019