Yehao Li

Orcid: 0000-0002-9603-1113

According to our database1, Yehao Li authored at least 48 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain.
CoRR, 2024

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer.
CoRR, 2024

HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs.
CoRR, 2024

2023
Dual Vision Transformer.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2023

Adaptive Semantic-Bit Communication for Extended Reality Interactions.
IEEE J. Sel. Top. Signal Process., September, 2023

Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning.
ACM Trans. Multim. Comput. Commun. Appl., February, 2023

Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing.
ACM Trans. Multim. Comput. Commun. Appl., January, 2023

Bottom-up and Top-down Object Inference Networks for Image Captioning.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Boosting Relationship Detection in Images with Multi-Granular Self-Supervised Learning.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Contextual Transformer Networks for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

Control3D: Towards Controllable Text-to-3D Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Threat-Aware Data Transmission in Software-Defined Networks.
Proceedings of the 8th International Conference on Data Science in Cyberspace, 2023

HGNet: Learning Hierarchical Geometry from Points, Edges, and Surfaces.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantic-Conditional Diffusion Networks for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training.
ACM Trans. Multim. Comput. Commun. Appl., 2022

Unpaired Image Captioning With semantic-Constrained Self-Learning.
IEEE Trans. Multim., 2022

Dual Vision Transformer.
CoRR, 2022

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation.
CoRR, 2022

Contextual and selective attention networks for image captioning.
Sci. China Inf. Sci., 2022

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Flexible User Duplexing in Cell-Free Massive MIMO: A Deep Reinforcement Learning Approach.
Proceedings of the IEEE/CIC International Conference on Communications in China, 2022

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement.
Proceedings of the Computer Vision - ECCV 2022, 2022

Comprehending and Ordering Semantics for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Interference-aware Spectrum and Power Coordination in Satellite-aided Cell-free Massive MIMO System.
Proceedings of the Communications and Networking - 17th EAI International Conference, 2022

An Elite Genetic Algorithm for Power Allocation in Cell-Free Massive MIMO Systems.
Proceedings of the Communications and Networking - 17th EAI International Conference, 2022

2021
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Deep Metric Learning With Density Adaptivity.
IEEE Trans. Multim., 2020

Pre-training for Video Captioning Challenge 2020 Summary.
CoRR, 2020

Exploring Depth Information for Spatial Relation Recognition.
Proceedings of the 3rd IEEE Conference on Multimedia Information Processing and Retrieval, 2020

Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

X-Linear Attention Networks for Image Captioning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Learning Click-Based Deep Structure-Preserving Embeddings with Visual Attention.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Multi-Source Domain Adaptation and Semi-Supervised Domain Adaptation with Focus on Visual Domain Adaptation Challenge 2019.
CoRR, 2019

Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019.
CoRR, 2019

Hierarchy Parsing for Image Captioning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Transferrable Prototypical Networks for Unsupervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Pointing Novel Objects in Image Captioning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Exploring Visual Relationship for Image Captioning.
Proceedings of the Computer Vision - ECCV 2018, 2018

Jointly Localizing and Describing Events for Dense Video Captioning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Boosting Image Captioning with Attributes.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Video ChatBot: Triggering Live Social Interactions by Automatic Video Commenting.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016


  Loading...