Chenghao Xiao

Orcid: 0000-0001-7623-8232

According to our database1, Chenghao Xiao authored at least 37 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Translation or Recitation? Calibrating Evaluation Scores for Machine Translation of Extremely Low-Resource Languages.
CoRR, March, 2026

MAEB: Massive Audio Embedding Benchmark.
CoRR, February, 2026

The Achilles' Heel of Angular Margins: A Chebyshev Polynomial Fix for Speaker Verification.
CoRR, January, 2026

RIGOURATE: Quantifying Scientific Exaggeration with Evidence-Aligned Claim Evaluation.
CoRR, January, 2026

Reevaluating zero-shot information extraction: Sampling bias, prompting transferability and sensitivity in large language models.
Inf. Process. Manag., 2026

2025
Scaling Language-Centric Omnimodal Representation Learning.
CoRR, October, 2025

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning.
CoRR, July, 2025

Adversarial Defence without Adversarial Defence: Enhancing Language Model Robustness via Instance-level Principal Component Removal.
CoRR, July, 2025

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning.
CoRR, June, 2025

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning.
CoRR, June, 2025

Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts.
CoRR, April, 2025

MMTEB: Massive Multilingual Text Embedding Benchmark.
CoRR, February, 2025

Adversarial Defense without <i>Adversarial Defense</i> : Enhancing Language Model Robustness via Instance-level Principal Component Removal.
Trans. Assoc. Comput. Linguistics, 2025

CAST: Corpus-Aware Self-similarity Enhanced Topic modelling.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Mieb: Massive Image Embedding Benchmark.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Everything is a Video: Unifying Modalities Through Next-Frame Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Quantifying Semantic Shift in Financial NLP: Robust Metrics for Market Prediction Stability.
Proceedings of the 6th ACM International Conference on AI in Finance, 2025

Crafting Customisable Characters with LLMs: A Persona-Driven Role-Playing Agent Framework.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
BioMNER: A Dataset for Biomedical Method Entity Recognition.
CoRR, 2024

SimsChat: A Customisable Persona-Driven Role-Playing Agent.
CoRR, 2024

The Power of Next-Frame Prediction for Learning Physical Laws.
CoRR, 2024

RAR-b: Reasoning as Retrieval Benchmark.
CoRR, 2024

Pixel Sentence Representation Learning.
CoRR, 2024

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

On the Rigour of Scientific Writing: Criteria, Analysis, and Insights.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Effective Distillation of Table-based Reasoning Ability from LLMs.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Audio Contrastive based Fine-tuning.
CoRR, 2023

Can Text Encoders be Deceived by Length Attack?
Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

Length is a Curse and a Blessing for Document-level Semantics.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

On Isotropy, Contextualization and Learning Dynamics of Contrastive-based Sentence Representation Learning.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
On Isotropy and Learning Dynamics of Contrastive-based Sentence Representation Learning.
CoRR, 2022

Fine-grained Main Ideas Extraction and Clustering of Online Course Reviews.
Proceedings of the Artificial Intelligence in Education - 23rd International Conference, 2022

SimStu-Transformer: A Transformer-Based Approach to Simulating Student Behaviour.
Proceedings of the Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners' and Doctoral Consortium, 2022


  Loading...