Xiang Dai

Orcid: 0000-0002-6020-9688

Affiliations:
  • Commonwealth Scientific and Industrial Research Organisation (CSIRO), Data61, Sydney, Australia
  • University of Sydney, School of Computer Science, Sydney, Australia (PhD 2021)


According to our database1, Xiang Dai authored at least 30 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Proceedings of the First Workshop of Evaluation of Multi-Modal Generation.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction.
J. Biomed. Informatics, 2024

An adaptive approach to noisy annotations in scientific information extraction.
Inf. Process. Manag., 2024

Can AI Extract Antecedent Factors of Human Trust in AI? An Application of Information Extraction for Scientific Literature in Behavioural and Computer Sciences.
CoRR, 2024

Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques.
CoRR, 2024

Understanding Faithfulness and Reasoning of Large Language Models on Plain Biomedical Summaries.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

A Critical Look at Meta-evaluating Summarisation Evaluation Metrics.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Born Differently Makes a Difference: Counterfactual Study of Bias in Biography Generation from a Data-to-Text Perspective.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

2023
Clinician-Facing AI in the Wild: Taking Stock of the Sociotechnical Challenges and Opportunities for HCI.
ACM Trans. Comput. Hum. Interact., April, 2023

Rethinking the Role of Entity Type in Relation Classification.
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

MultiFin: A Dataset for Multilingual Financial NLP.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Can Social Media Inform Dietary Approaches for Health Management? A Dataset and Benchmark for Low-Carb Diet.
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, 2023

CSIRO Data61 Team at BioLaySumm Task 1: Lay Summarisation of Biomedical Research Articles Using Generative Models.
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, 2023

2022
Detecting Entities in the Astrophysics Literature: A Comparison of Word-based and Span-based Entity Recognition Methods.
CoRR, 2022

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification.
CoRR, 2022

Revisiting Transformer-based Models for Long Document Classification.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021
Recognising Biomedical Names: Challenges and Solutions.
CoRR, 2021

mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

SearchEHR: A Family History Search System for Clinical Decision Support.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020
NLNDE at CANTEMIST: Neural Sequence Labeling and Parsing Approaches for Clinical Concept Extraction.
Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020), 2020

Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

An Analysis of Simple Data Augmentation for Named Entity Recognition.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

An Effective Transition-based Model for Discontinuous NER.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Using Similarity Measures to Select Pretraining Data for NER.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

NNE: A Dataset for Nested Named Entity Recognition in English Newswire.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Shot Or Not: Comparison of NLP Approaches for Vaccination Behaviour Detection.
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task, 2018

Recognizing Complex Entity Mentions: A Review and Future Directions.
Proceedings of ACL 2018, Melbourne, Australia, July 15-20, 2018, Student Research Workshop, 2018

2017
Automatic Diagnosis Coding of Radiology Reports: A Comparison of Deep Learning and Conventional Classification Methods.
Proceedings of the BioNLP 2017, Vancouver, Canada, August 4, 2017, 2017

Medication and Adverse Event Extraction from Noisy Text.
Proceedings of the Australasian Language Technology Association Workshop, 2017


  Loading...