Liang Wang

Orcid: 0000-0003-4664-7136

Affiliations:
  • Yuanfudao AI Lab, Beijing, China
  • Peking University, Key Laboratory of Computational Linguistics, Beijing, China


According to our database1, Liang Wang authored at least 36 papers between 2014 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Thinking Augmented Pre-training.
CoRR, September, 2025

MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings.
CoRR, June, 2025

WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale.
CoRR, February, 2025

Chain-of-Retrieval Augmented Generation.
CoRR, January, 2025

Little Giants: Synthesizing High-Quality Embedding Data at Scale.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Generative Representational Instruction Tuning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Bootstrap Your Own Context Length.
CoRR, 2024

Multilingual E5 Text Embeddings: A Technical Report.
CoRR, 2024

Fine-Tuning LLaMA for Multi-Stage Text Retrieval.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LongEmbed: Extending Embedding Models for Long Context Retrieval.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Learning to Retrieve In-Context Examples for Large Language Models.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

Improving Text Embeddings with Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Learning to Rank in Generative Retrieval.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Large Search Model: Redefining Search Stack in the Era of LLMs.
SIGIR Forum, December, 2023

Generative retrieval for conversational question answering.
Inf. Process. Manag., September, 2023

Inference with Reference: Lossless Acceleration of Large Language Models.
CoRR, 2023

Query2doc: Query Expansion with Large Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Multiview Identifiers Enhanced Generative Retrieval.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Text Embeddings by Weakly-Supervised Contrastive Pre-training.
CoRR, 2022

Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval.
CoRR, 2022

SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained Language Models.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Aligning Cross-lingual Sentence Representations with Dual Momentum Contrast.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020
Investigating Label Bias in Beam Search for Open-ended Text Generation.
CoRR, 2020

2019
Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Denoising based Sequence-to-Sequence Pre-training for Text Generation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018
Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine Comprehension.
Proceedings of The 12th International Workshop on Semantic Evaluation, 2018

Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

2017
PKU_ICL at SemEval-2017 Task 10: Keyphrase Extraction with Model Ensemble and External Knowledge.
Proceedings of the 11th International Workshop on Semantic Evaluation, 2017

Learning to Rank Semantic Coherence for Topic Segmentation.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

2016
Topic Segmentation of Web Documents with Automatic Cue Phrase Identification and BLSTM-CNN.
Proceedings of the Natural Language Understanding and Intelligent Applications, 2016

Multi-task Learning for Gender and Age Prediction on Chinese Microblog.
Proceedings of the Natural Language Understanding and Intelligent Applications, 2016

A User Adaptive Model for Followee Recommendation on Twitter.
Proceedings of the Natural Language Understanding and Intelligent Applications, 2016

2014
Text-level Discourse Dependency Parsing.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014


  Loading...