He Bai

Orcid: 0000-0002-8933-647X

Affiliations:

University of Waterloo, School of Computer Science, Canada
Chinese Academy of Sciences, Institute of Automation, National Laboratory of Pattern Recognition, Beijing, China (former)

According to our database¹, He Bai authored at least 30 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Embarrassingly Simple Self-Distillation Improves Code Generation.

[BibT_eX]

[DOI]

CoRR, April, 2026

From Past To Path: Masked History Learning for Next-Item Prediction in Generative Recommendation.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning.

[BibT_eX]

[DOI]

CoRR, November, 2025

Closing the Gap Between Text and Speech Understanding in LLMs.

[BibT_eX]

[DOI]

CoRR, October, 2025

SpeakStream: Streaming Text-to-Speech with Interleaved Data.

[BibT_eX]

[DOI]

CoRR, May, 2025

Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions.

[BibT_eX]

[DOI]

CoRR, February, 2025

ChipChat: Low-Latency Cascaded Conversational Agent in MLX.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Training Bilingual LMs with Data Constraints in the Targeted Language.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

dMel: Speech Tokenization made Simple.

[BibT_eX]

[DOI]

CoRR, 2024

Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

How Far Are We from Intelligent Visual Deductive Reasoning?

[BibT_eX]

[DOI]

CoRR, 2024

Divide-or-Conquer? Which Part Should You Distill Your LLM?

[BibT_eX]

[DOI]

V. G. Vinod Vydiswaran

Navdeep Jaitly

Yizhe Zhang

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Construction of Paired Knowledge Graph - Text Datasets Informed by Cyclic Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know.

[BibT_eX]

[DOI]

CoRR, 2023

2022

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech.

[BibT_eX]

[DOI]

CoRR, 2022

A<sup>3</sup>T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Cross-lingual Text-to-SQL Semantic Parsing with Representation Mixup.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Better Language Model with Hypernym Class Prediction.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Cross-Lingual Training with Dense Retrieval for Document Retrieval.

[BibT_eX]

[DOI]

CoRR, 2021

Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation with GPT2.

[BibT_eX]

[DOI]

Proceedings of the ACL-IJCNLP 2021 Student Research Workshop, 2021

Segatron: Segment-Aware Transformer for Language Modeling and Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Latte-Mix: Measuring Sentence Semantic Similarity with Latent Categorical Mixtures.

[BibT_eX]

[DOI]

CoRR, 2020

SegaBERT: Pre-training of Segment-aware BERT for Language Understanding.

[BibT_eX]

[DOI]

CoRR, 2020

Semantics of the Unwritten.

[BibT_eX]

[DOI]

CoRR, 2020

Cross-Lingual Training of Neural Models for Document Ranking.

[BibT_eX]

[DOI]

Peng Shi

He Bai

Jimmy Lin

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

2019

Memory Consolidation for Contextual Spoken Language Understanding with Dialogue Logistic Inference.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018

Source Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Computational Linguistics, 2018

He Bai

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...