Yanghao Li

Orcid: 0000-0002-5274-1367

According to our database¹, Yanghao Li authored at least 78 papers between 2014 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Accelerating Byzantine-Robust Distributed Learning with Compressed Communication via Double Momentum and Variance Reduction.

[BibT_eX]

[DOI]

Yanghao Li

Changxin Liu

Yuhao Yi

CoRR, March, 2026

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation.

[BibT_eX]

[DOI]

CoRR, March, 2026

Imagination Helps Visual Reasoning, But Not Yet in Latent Space.

[BibT_eX]

[DOI]

CoRR, February, 2026

RSMeM: Knowledge-Enhanced Memory Evolution for Remote Sensing Agents with Systematic Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

Ego4D: Around the World in 3,600 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation.

[BibT_eX]

[DOI]

CoRR, September, 2025

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer.

[BibT_eX]

[DOI]

CoRR, September, 2025

MiniCPM4: Ultra-Efficient LLMs on End Devices.

[BibT_eX]

[DOI]

CoRR, June, 2025

CITR: Efficient Long Video Understanding Needs Causal Importance.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Improving Communication-Efficient and Byzantine-Robust Distributed Learning with Local Adaptive Momentum.

[BibT_eX]

[DOI]

Yanghao Li

Changxin Liu

Yuhao Yi

Proceedings of the International Joint Conference on Neural Networks, 2025

SEP: A General Lossless Compression Framework with Semantics Enhancement and Multi-Stream Pipelines.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning.

[BibT_eX]

[DOI]

Jean-Philippe Fauconnier

Zhengfeng Lai

Haoxuan You

Zirui Wang

et al.

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Improve Vision Language Model Chain-of-thought Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

MM-Ego: Towards Building Egocentric Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering.

[BibT_eX]

[DOI]

Changxin Liu

Yanghao Li

Yuhao Yi

Karl Henrik Johansson

CoRR, 2024

Idempotence and Perceptual Image Compression.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

R-MAE: Regions Meet Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Bandwidth-Efficient Inference for Nerual Image Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Bandwidth-efficient Inference for Neural Image Compression.

[BibT_eX]

[DOI]

CoRR, 2023

Conditional Perceptual Quality Preserving Image Compression.

[BibT_eX]

[DOI]

CoRR, 2023

Evaluating Strong Idempotence of Image Codec.

[BibT_eX]

[DOI]

CoRR, 2023

Idempotent Learned Image Compression with Right-Inverse.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MAViL: Masked Audio-Video Learners.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the International Conference on Machine Learning, 2023

Diffusion Models as Masked Autoencoders.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Your Camera Improves Your Point Cloud Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization.

[BibT_eX]

[DOI]

Juan-Manuel Pérez-Rúa

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Scaling Language-Image Pre-Training via Masking.

[BibT_eX]

[DOI]

Yanghao Li

Haoqi Fan

Ronghang Hu

Christoph Feichtenhofer

Kaiming He

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Bit Allocation using Optimization.

[BibT_eX]

[DOI]

CoRR, 2022

Negative Frames Matter in Egocentric Visual Query 2D Localization.

[BibT_eX]

[DOI]

Juan-Manuel Pérez-Rúa

Tao Xiang

CoRR, 2022

Masked Autoencoders As Spatiotemporal Learners.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Haoqi Fan

Yanghao Li

Kaiming He

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Rate Control for Learned Video Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Exploring Plain Vision Transformer Backbones for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Reversible Vision Transformers.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Jitendra Malik

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Masked Autoencoders Are Scalable Vision Learners.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Improved Multiscale Vision Transformers for Classification and Detection.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

CoRR, 2021

Benchmarking Detection Transfer Learning with Vision Transformers.

[BibT_eX]

[DOI]

CoRR, 2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

CoRR, 2021

PyTorchVideo: A Deep Learning Library for Video Understanding.

[BibT_eX]

[DOI]

Haoqi Fan

Tullie Murrell

Heng Wang

Kalyan Vasudev Alwala

Christoph Feichtenhofer

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Multiscale Vision Transformers.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Model-Blind Temporal Denoisers without Ground Truths.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Decision Tree Based Inter Partition Termination For Av1 Encoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

A Benchmark Dataset and Comparison Study for Multi-modal Human Action Analytics.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2020

Modality Compensation Network: Cross-Modal Adaptation for Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Ego-Topo: Environment Affordances From Egocentric Video.

[BibT_eX]

[DOI]

Tushar Nagarajan

Yanghao Li

Christoph Feichtenhofer

Kristen Grauman

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Multi-Modality Multi-Task Recurrent Neural Network for Online Action Detection.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2019

SimpleDet: A Simple and Versatile Distributed Framework for Object Detection and Instance Recognition.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2019

Scale-Aware Trident Networks for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Temporal Bilinear Networks for Video Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Adaptive Batch Normalization for practical domain adaptation.

[BibT_eX]

[DOI]

Pattern Recognit., 2018

Click versus Share: A Feature-driven Study of Micro-Video Popularity and Virality in Social Media.

[BibT_eX]

[DOI]

Proceedings of the 2018 SIAM International Conference on Data Mining, 2018

Rethinking Fusion Baselines for Multi-modal Human Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

A Deep Convolutional Network Based Supervised Coarse-to-Fine Algorithm for Optical Flow Measurement.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Workshop on Multimedia Signal Processing, 2018

2017

PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding.

[BibT_eX]

[DOI]

CoRR, 2017

Characterizing the Click and Share Dynamics of Micro-Videos in Social Media.

[BibT_eX]

[DOI]

Proceedings of the Posters and Demos Proceedings of the Conference of the ACM Special Interest Group on Data Communication, 2017

PKU-MMD: A Large Scale Benchmark for Skeleton-Based Human Action Understanding.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Visual Analysis in Smart and Connected Communities, 2017

Demystifying Neural Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Revisiting Batch Normalization For Practical Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

Deep joint discriminative learning for vehicle re-identification and retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Factorized Bilinear Models for Image Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Online action detection and forecast via Multitask deep Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Temporal Perceptive Network for Skeleton-Based Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference 2017, 2017

2016

Joint sub-band based neighbor embedding for image super-resolution.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Online Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

Multi-pose face hallucination via neighbor embedding for facial components.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Neighborhood regression for edge-preserving image super-resolution.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Face hallucination based on neighbor embedding via illumination adaptation.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014

Image transformation using limited reference with application to photo-sketch synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Visual Communications and Image Processing Conference, 2014

Yanghao Li

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...