Li Dong

Orcid: 0000-0003-3083-7170

Affiliations:
  • Microsoft Research Asia, Natural Language Computing Group, Beijing, China
  • University of Edinburgh, School of Informatics, Edinburgh, UK (PhD 2019)
  • Beihang University, State Key Laboratory of Software Development Environment, Beijing, China (former)


According to our database1, Li Dong authored at least 115 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.
CoRR, 2024

Towards Optimal Learning of Language Models.
CoRR, 2024

Language Models as Inductive Reasoners.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

2023
A Unified View of Masked Image Modeling.
Trans. Mach. Learn. Res., 2023

BitNet: Scaling 1-bit Transformers for Large Language Models.
CoRR, 2023

Kosmos-G: Generating Images in Context with Multimodal Large Language Models.
CoRR, 2023

Kosmos-2.5: A Multimodal Literate Model.
CoRR, 2023

Large Language Model for Science: A Study on P vs. NP.
CoRR, 2023

Retentive Network: A Successor to Transformer for Large Language Models.
CoRR, 2023

LongNet: Scaling Transformers to 1, 000, 000, 000 Tokens.
CoRR, 2023

Kosmos-2: Grounding Multimodal Large Language Models to the World.
CoRR, 2023

Knowledge Distillation of Large Language Models.
CoRR, 2023

Augmenting Language Models with Long-Term Memory.
CoRR, 2023

Augmenting Language Models with Long-Term Memory.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Language Is Not All You Need: Aligning Perception with Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Optimizing Prompts for Text-to-Image Generation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Extensible Prompts for Language Models on Zero-shot Language Style Customization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Magneto: A Foundation Transformer.
Proceedings of the International Conference on Machine Learning, 2023

Visually-Augmented Language Modeling.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Prototypical Calibration for Few-shot Learning of Language Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Corrupted Image Modeling for Self-Supervised Visual Pre-Training.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Non-Contrastive Learning Meets Language-Image Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Image as a Foreign Language: BEIT Pretraining for Vision and Vision-Language Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Generic-to-Specific Distillation of Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

A Length-Extrapolatable Transformer.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Pre-Training to Learn in Context.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Transforming Wikipedia Into Augmented Data for Query-Focused Summarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers.
CoRR, 2022

Structured Prompting: Scaling In-Context Learning to 1, 000 Examples.
CoRR, 2022

Extensible Prompts for Language Models.
CoRR, 2022

TorchScale: Transformers at Scale.
CoRR, 2022

Foundation Transformers.
CoRR, 2022

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks.
CoRR, 2022

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers.
CoRR, 2022

Language Models are General-Purpose Interfaces.
CoRR, 2022

VL-BEiT: Generative Vision-Language Pretraining.
CoRR, 2022

Prototypical Calibration for Few-shot Learning of Language Models.
CoRR, 2022

On the Representation Collapse of Sparse Mixture of Experts.
CoRR, 2022

DeepNet: Scaling Transformers to 1, 000 Layers.
CoRR, 2022

A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models.
CoRR, 2022

Kformer: Knowledge Injection in Transformer Feed-Forward Layers.
Proceedings of the Natural Language Processing and Chinese Computing, 2022

On the Representation Collapse of Sparse Mixture of Experts.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

BEiT: BERT Pre-Training of Image Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Swin Transformer V2: Scaling Up Capacity and Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Controllable Natural Language Generation with Contrastive Prefixes.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Knowledge Neurons in Pretrained Transformers.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

StableMoE: Stable Routing Strategy for Mixture of Experts.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts.
CoRR, 2021

s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning.
CoRR, 2021

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA.
CoRR, 2021

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders.
CoRR, 2021

BEiT: BERT Pre-Training of Image Transformers.
CoRR, 2021

Knowledge Neurons in Pretrained Transformers.
CoRR, 2021

mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs.
CoRR, 2021

Learning natural language interfaces with neural models.
AI Matters, 2021

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task.
Proceedings of the Sixth Conference on Machine Translation, 2021

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Allocating Large Vocabulary Capacity for Cross-Lingual Language Model Pre-Training.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Consistency Regularization for Cross-Lingual Fine-Tuning.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Memory-Efficient Differentiable Transformer Architecture Search.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Learning to Sample Replacements for ELECTRA Pre-Training.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders.
CoRR, 2020

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Investigating Learning Dynamics of BERT Fine-Tuning.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

Can Monolingual Pretrained Models Help Cross-Lingual Classification?
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training.
Proceedings of the 37th International Conference on Machine Learning, 2020

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks.
Proceedings of the Computer Vision - ECCV 2020, 2020

Harvesting and Refining Question-Answer Pairs for Unsupervised QA.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Cross-Lingual Natural Language Generation via Pre-Training.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Multitask learning for biomedical named entity recognition with cross-sharing structure.
BMC Bioinform., 2019

Unified Language Model Pre-training for Natural Language Understanding and Generation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Visualizing and Understanding the Effectiveness of BERT.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Learning to Ask Unanswerable Questions for Machine Reading Comprehension.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Data-to-text Generation with Entity Modeling.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Inspecting Unification of Encoding and Matching with Transformer: A Case Study of Machine Reading Comprehension.
Proceedings of the 2nd Workshop on Machine Reading for Question Answering, 2019

Data-to-Text Generation with Content Selection and Planning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Proactive Resource Management for LTE in Unlicensed Spectrum: A Deep Learning Perspective.
IEEE Trans. Wirel. Commun., 2018

Confidence Modeling for Neural Semantic Parsing.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Coarse-to-Fine Decoding for Neural Semantic Parsing.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Proactive Resource Management in LTE-U Systems: A Deep Learning Perspective.
CoRR, 2017

Learning to Paraphrase for Question Answering.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Learning to Generate Product Reviews from Attributes.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

2016
Adaptive Multi-Compositionality for Recursive Neural Network Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Solving and Generating Chinese Character Riddles.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Long Short-Term Memory-Networks for Machine Reading.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Language to Logical Form with Neural Attention.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

2015
A Joint Segmentation and Classification Framework for Sentence Level Sentiment Classification.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

A Statistical Parsing Framework for Sentiment Classification.
Comput. Linguistics, 2015

Splusplus: A Feature-Rich Two-stage Classifier for Sentiment Analysis of Tweets.
Proceedings of the 9th International Workshop on Semantic Evaluation, 2015

A Hybrid Neural Model for Type Classification of Entity Mentions.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Question Answering over Freebase with Multi-Column Convolutional Neural Networks.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
A Joint Segmentation and Classification Framework for Sentiment Analysis.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013
Unraveling the origin of exponential law in intra-urban human mobility
CoRR, 2013

The Automated Acquisition of Suggestions from Tweets.
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012
Modeling collective human mobility: Understanding exponential law of intra-urban movement
CoRR, 2012

MoodLens: an emoticon-based sentiment analysis system for chinese tweets.
Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012

2011
Performance of Local Information Based Link Prediction: A Sampling Perspective
CoRR, 2011


  Loading...