Ronghang Hu

Orcid: 0000-0002-5060-9485

According to our database¹, Ronghang Hu authored at least 31 papers between 2014 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

SAM 3: Segment Anything with Concepts.

[BibT_eX]

[DOI]

CoRR, November, 2025

SAM 2: Segment Anything in Images and Videos.

[BibT_eX]

[DOI]

Kalyan Vasudev Alwala

Christoph Feichtenhofer

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2023

UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Scaling Language-Image Pre-Training via Masking.

[BibT_eX]

[DOI]

Yanghao Li

Haoqi Fan

Ronghang Hu

Christoph Feichtenhofer

Kaiming He

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Exploring Long-Sequence Masked Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2022

FLAVA: A Foundational Language And Vision Alignment Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer.

[BibT_eX]

[DOI]

Ronghang Hu

Amanpreet Singh

CoRR, 2021

UniT: Multimodal Multitask Learning with a Unified Transformer.

[BibT_eX]

[DOI]

Ronghang Hu

Amanpreet Singh

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Structured Models for Vision-and-Language Reasoning.

[BibT_eX]

[DOI]

Ronghang Hu

PhD thesis, 2020

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image.

[BibT_eX]

[DOI]

Ronghang Hu

Deepak Pathak

CoRR, 2020

TextCaps: A Dataset for Image Captioning with Reading Comprehension.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Language-Conditioned Graph Networks for Relational Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018

Generating Counterfactual Explanations with Natural Language.

[BibT_eX]

[DOI]

CoRR, 2018

Speaker-Follower Models for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Louis-Philippe Morency

Taylor Berg-Kirkpatrick

Kate Saenko

Dan Klein

Trevor Darrell

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Explainable Neural Computation via Stack Neural Module Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Grounding Visual Explanations.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Learning to Segment Every Thing.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Grounding Visual Explanations (Extended Abstract).

[BibT_eX]

[DOI]

CoRR, 2017

Learning to Reason: End-to-End Module Networks for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Modeling Relationships in Referential Expressions with Compositional Modular Networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions.

[BibT_eX]

[DOI]

Ronghang Hu

Marcus Rohrbach

Subhashini Venugopalan

Trevor Darrell

CoRR, 2016

Grounding of Textual Phrases in Images by Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Segmentation from Natural Language Expressions.

[BibT_eX]

[DOI]

Ronghang Hu

Marcus Rohrbach

Trevor Darrell

Proceedings of the Computer Vision - ECCV 2016, 2016

Natural Language Object Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Spatial Semantic Regularisation for Large Scale Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

2014

LSDA: Large Scale Detection through Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Robust Head-Shoulder Detection Using a Two-Stage Cascade Framework.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Ronghang Hu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...