Bei Liu

Orcid: 0000-0001-8857-0953

Affiliations:

Microsoft Research Asia, Beijing, China

According to our database¹, Bei Liu authored at least 51 papers between 2014 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

SkillOpt: Executive Strategy for Self-Evolving Agent Skills.

[BibT_eX]

[DOI]

CoRR, May, 2026

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills.

[BibT_eX]

[DOI]

CoRR, May, 2026

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark.

[BibT_eX]

[DOI]

CoRR, May, 2026

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

InfoAgent: Advancing Autonomous Information-Seeking Agents.

[BibT_eX]

[DOI]

CoRR, September, 2025

Transferring Foundation Models for Generalizable Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

RoLD: Robot Latent Diffusion for Multi-task Policy Modeling.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling, 2025

SMPV: Social Media Prediction for Videos.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

2024

Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

Spatiotemporal Predictive Pre-training for Robotic Motor Control.

[BibT_eX]

[DOI]

CoRR, 2024

Revisiting Latent Space of GAN Inversion for Robust Real Image Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

ViCo: Engaging Video Comment Generation with Human Preference Rewards.

[BibT_eX]

[DOI]

Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024

SMP Challenge Summary: Social Media Prediction Challenge.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Capture Concept Through Comparison: Vision-and-Language Representation Learning with Intrinsic Information Mining.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

2023

Language-Guided Face Animation by Recurrent StyleGAN-Based Generator.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Revisiting Latent Space of GAN Inversion for Real Image Editing.

[BibT_eX]

[DOI]

CoRR, 2023

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots.

[BibT_eX]

[DOI]

CoRR, 2023

Balancing Reconstruction and Editing Quality of GAN Inversion for Real Image Editing with StyleGAN Prior Latent Space.

[BibT_eX]

[DOI]

CoRR, 2023

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation.

[BibT_eX]

[DOI]

CoRR, 2023

SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Alignment.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SINC: Self-Supervised In-Context Learning for Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Consumer Electronics, 2023

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment.

[BibT_eX]

[DOI]

CoRR, 2022

Exploring Anchor-based Detection for Ego4D Natural Language Query.

[BibT_eX]

[DOI]

CoRR, 2022

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Reference-Based Defect Detection Network.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training.

[BibT_eX]

[DOI]

CoRR, 2021

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Searching the Search Space of Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning Fine-Grained Motion Embedding for Landscape Animation.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Unifying Multimodal Transformer for Bi-directional Image and Text Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

MMPT'21: International Joint Workshop on Multi-Modal Pre-Training for Multimedia Understanding.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Yong Rui

Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Seeing Out of the Box: End-to-End Pre-Training for Vision-Language Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers.

[BibT_eX]

[DOI]

CoRR, 2020

Aesthetic-Aware Image Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

2019

Learning Rich Image Region Representation for Visual Question Answering.

[BibT_eX]

[DOI]

CoRR, 2019

WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection.

[BibT_eX]

[DOI]

CoRR, 2019

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

CoRR, 2019

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

SMP Challenge: An Overview of Social Media Prediction Challenge 2019.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Emotion Reinforced Visual Storytelling.

[BibT_eX]

[DOI]

Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2016

Cognition-Aware Summarization of Photos Representing Events.

[BibT_eX]

[DOI]

Bei Liu

Makoto P. Kato

Katsumi Tanaka

IEICE Trans. Inf. Syst., 2016

2014

Finding Photo Sets of Events by Minimizing Misrecognition from Neighbor Events.

[BibT_eX]

[DOI]

Bei Liu

Makoto P. Kato

Katsumi Tanaka

Proceedings of the Web-Age Information Management - 15th International Conference, 2014

Bei Liu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...