Yehao Li

Orcid: 0009-0002-1486-9244

According to our database¹, Yehao Li authored at least 63 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer.

[BibT_eX]

[DOI]

CoRR, May, 2026

DreamVAR: Taming Reinforced Visual Autoregressive Model for High-Fidelity Subject-Driven Image Generation.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

Visual Autoregressive Modeling for Instruction-Guided Image Editing.

[BibT_eX]

[DOI]

CoRR, August, 2025

Joint AP Scheduling and Power Allocation Based on Synergistic DRL for Cell-Free Massive MIMO.

[BibT_eX]

[DOI]

IEEE Commun. Lett., May, 2025

HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, May, 2025

Exploring Vision-Language Foundation Model for Novel Object Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., January, 2025

Stream-ViT: Learning Streamlined Convolutions in Vision Transformer.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

Identity-Preserving Video Generation Challenge.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

HiDream-I1: An Open-Source High-Efficient Image Generative Foundation Model.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Denoising Token Prediction in Masked Autoregressive Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2024

HIRI-ViT: Scaling Vision Transformer With High Resolution Inputs.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Virtual Try-On with Garment-Focused Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Improving Text-Guided Object Inpainting with Semantic Pre-inpainting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SD-DiT: Unleashing the Power of Self-Supervised Discrimination in Diffusion Transformer<sup>*</sup>.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Boosting Diffusion Models with Moving Average Sampling in Frequency Domain.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Dual Vision Transformer.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2023

Adaptive Semantic-Bit Communication for Extended Reality Interactions.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., September, 2023

Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2023

Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., January, 2023

Bottom-up and Top-down Object Inference Networks for Image Captioning.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2023

Boosting Relationship Detection in Images with Multi-Granular Self-Supervised Learning.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2023

Contextual Transformer Networks for Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2023

Control3D: Towards Controllable Text-to-3D Generation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Threat-Aware Data Transmission in Software-Defined Networks.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Data Science in Cyberspace, 2023

HGNet: Learning Hierarchical Geometry from Points, Edges, and Surfaces.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantic-Conditional Diffusion Networks for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2022

Unpaired Image Captioning With semantic-Constrained Self-Learning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

Dual Vision Transformer.

[BibT_eX]

[DOI]

CoRR, 2022

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation.

[BibT_eX]

[DOI]

CoRR, 2022

Contextual and selective attention networks for image captioning.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2022

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Flexible User Duplexing in Cell-Free Massive MIMO: A Deep Reinforcement Learning Approach.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CIC International Conference on Communications in China, 2022

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Comprehending and Ordering Semantics for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Interference-aware Spectrum and Power Coordination in Satellite-aided Cell-free Massive MIMO System.

[BibT_eX]

[DOI]

Proceedings of the Communications and Networking - 17th EAI International Conference, 2022

An Elite Genetic Algorithm for Power Allocation in Cell-Free Massive MIMO Systems.

[BibT_eX]

[DOI]

Proceedings of the Communications and Networking - 17th EAI International Conference, 2022

2021

CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Deep Metric Learning With Density Adaptivity.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

Pre-training for Video Captioning Challenge 2020 Summary.

[BibT_eX]

[DOI]

CoRR, 2020

Exploring Depth Information for Spatial Relation Recognition.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE Conference on Multimedia Information Processing and Retrieval, 2020

Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

X-Linear Attention Networks for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Learning Click-Based Deep Structure-Preserving Embeddings with Visual Attention.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2019

Multi-Source Domain Adaptation and Semi-Supervised Domain Adaptation with Focus on Visual Domain Adaptation Challenge 2019.

[BibT_eX]

[DOI]

CoRR, 2019

Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019.

[BibT_eX]

[DOI]

CoRR, 2019

Hierarchy Parsing for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Transferrable Prototypical Networks for Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Pointing Novel Objects in Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Exploring Visual Relationship for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Jointly Localizing and Describing Events for Dense Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Boosting Image Captioning with Attributes.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Video ChatBot: Triggering Live Social Interactions by Automatic Video Commenting.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Yehao Li

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...