We stand with Ukraine

We stand with Ukraine

Damai Dai

Orcid: 0009-0004-9714-7902

According to our database¹, Damai Dai authored at least 55 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Large Language Models Struggle with Unreasonability in Math Problems.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

mHC: Manifold-Constrained Hyper-Connections.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, December, 2025

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, February, 2025

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Mingchuan Zhang

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Nat., 2025

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Language Models Encode the Value of Numbers Linearly.

[DOI]

,

,

Proceedings of the 31st International Conference on Computational Linguistics, 2025

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Exploring Activation Patterns of Parameters in Language Models.

[DOI]

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts.

[DOI]

,

,

,

,

CoRR, 2024

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Exploring Activation Patterns of Parameters in Language Models.

[DOI]

,

,

CoRR, 2024

Large Language Models Are Unconscious of Unreasonability in Math Problems.

[DOI]

,

,

CoRR, 2024

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

Language Models Understand Numbers, at Least Partially.

[DOI]

,

,

CoRR, 2024

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Mingchuan Zhang

,

,

,

,

,

,

,

,

,

CoRR, 2024

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models.

[DOI]

,

,

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

A Survey on In-context Learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2023

Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization.

[DOI]

,

,

,

,

,

,

CoRR, 2023

A Survey for In-context Learning.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2023

Coarse-to-Fine Entity Representations for Document-Level Relation Extraction.

[DOI]

,

,

,

,

Proceedings of the Natural Language Processing and Chinese Computing, 2023

Mixture-of-Experts for Biomedical Question Answering.

[DOI]

,

,

,

,

,

Proceedings of the Natural Language Processing and Chinese Computing, 2023

Neural Knowledge Bank for Pretrained Transformers.

[DOI]

,

,

,

,

Proceedings of the Natural Language Processing and Chinese Computing, 2023

Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning.

[DOI]

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion.

[DOI]

,

,

,

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers.

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers.

[DOI]

,

,

,

,

,

CoRR, 2022

Neural Knowledge Bank for Pretrained Transformers.

[DOI]

,

,

,

,

,

CoRR, 2022

On the Representation Collapse of Sparse Mixture of Experts.

[DOI]

,

,

,

,

,

,

Saksham Singhal

,

,

,

CoRR, 2022

Mixture of Experts for Biomedical Question Answering.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Plug-and-Play Module for Commonsense Reasoning in Machine Reading Comprehension.

[DOI]

,

,

,

Proceedings of the Natural Language Processing and Chinese Computing, 2022

On the Representation Collapse of Sparse Mixture of Experts.

[DOI]

,

,

,

,

,

,

Saksham Singhal

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Robust Fine-tuning via Perturbation and Interpolation from In-batch Instances.

[DOI]

,

,

,

,

,

,

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Calibrating Factual Knowledge in Pretrained Language Models.

[DOI]

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Hierarchical Curriculum Learning for AMR Parsing.

[DOI]

,

,

,

,

,

,

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2022

Knowledge Neurons in Pretrained Transformers.

[DOI]

,

,

,

,

,

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

StableMoE: Stable Routing Strategy for Mixture of Experts.

[DOI]

,

,

,

,

,

,

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Explicit Interaction Network for Aspect Sentiment Triplet Extraction.

[DOI]

,

,

,

,

,

,

,

CoRR, 2021

Knowledge Neurons in Pretrained Transformers.

[DOI]

,

,

,

,

CoRR, 2021

Incorporating Connections Beyond Knowledge Embeddings: A Plug-and-Play Module to Enhance Commonsense Reasoning in Machine Reading Comprehension.

[DOI]

,

,

,

CoRR, 2021

Inductively Representing Out-of-Knowledge-Graph Entities by Optimal Estimation Under Translational Assumptions.

[DOI]

,

,

,

,

,

Proceedings of the 6th Workshop on Representation Learning for NLP, 2021

Decompose, Fuse and Generate: A Formation-Informed Method for Chinese Definition Generation.

[DOI]

,

,

,

,

,

,

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Leveraging Word-Formation Knowledge for Chinese Word Sense Disambiguation.

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Behind the Scenes: An Exploration of Trigger Biases Problem in Few-Shot Event Classification.

[DOI]

,

,

,

,

,

Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2019

Learning to Control the Fine-grained Sentiment for Story Ending Generation.

[DOI]

,

,

,

,

,

,

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts.

[DOI]

,

,

,

,

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Sememe Prediction: Learning Semantic Knowledge from Unstructured Textual Wiki Descriptions.

[DOI]

,

,

,

,

,

CoRR, 2018

Live Video Comment Generation Based on Surrounding Frames and Live Comments.

[DOI]

CoRR, 2018

2017

FISF: Better User Experience using Smaller Bandwidth for Panoramic Virtual Reality Video.

[DOI]

,

,

,

,

,

,

,

CoRR, 2017

Loading...