We stand with Ukraine

We stand with Ukraine

Ross B. Girshick

Affiliations:

University of California, Berkeley, USA

According to our database¹, Ross B. Girshick authored at least 105 papers between 2004 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2025

SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding.

[DOI]

,

,

,

Ross B. Girshick

,

,

CoRR, November, 2025

Microsoft COCO.

[DOI]

,

,

Serge J. Belongie

,

Lubomir D. Bourdev

,

Ross B. Girshick

,

,

,

,

C. Lawrence Zitnick

,

Dataset, June, 2025

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning.

[DOI]

,

Bharath Hariharan

,

Laurens van der Maaten

,

,

C. Lawrence Zitnick

,

Ross B. Girshick

Dataset, February, 2025

SAM 2: Segment Anything in Images and Videos.

[DOI]

,

Valentin Gabeur

,

,

,

Chaitanya Ryali

,

,

,

,

,

Laura Gustafson

,

,

,

Kalyan Vasudev Alwala

,

,

,

Ross B. Girshick

,

,

Christoph Feichtenhofer

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models.

[DOI]

CoRR, 2024

PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators.

[DOI]

,

,

,

,

,

Alvaro Herrasti

,

Ross B. Girshick

,

Aniruddha Kembhavi

,

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

2023

The effectiveness of MAE pre-pretraining for billion-scale pretraining.

[DOI]

,

,

Kalyan Vasudev Alwala

,

,

Vaibhav Aggarwal

,

,

,

,

Christoph Feichtenhofer

,

Ross B. Girshick

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Segment Anything.

[DOI]

Alexander Kirillov

,

,

,

,

,

Laura Gustafson

,

,

Spencer Whitehead

,

Alexander C. Berg

,

,

,

Ross B. Girshick

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

Exploring Plain Vision Transformer Backbones for Object Detection.

[DOI]

,

,

Ross B. Girshick

,

Proceedings of the Computer Vision - ECCV 2022, 2022

Revisiting Weakly Supervised Pre-Training of Visual Perception Models.

[DOI]

,

Laura Gustafson

,

,

Vinicius de Freitas Reis

,

,

Raj Prateek Kosaraju

,

,

Ross B. Girshick

,

,

Laurens van der Maaten

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Masked Autoencoders Are Scalable Vision Learners.

[DOI]

,

,

,

,

,

Ross B. Girshick

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Benchmarking Detection Transfer Learning with Vision Transformers.

[DOI]

,

,

,

,

,

Ross B. Girshick

CoRR, 2021

Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details.

[DOI]

,

,

,

Alexander Kirillov

,

Ross B. Girshick

CoRR, 2021

Early Convolutions Help Transformers See Better.

[DOI]

,

,

,

,

,

Ross B. Girshick

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

PyTorchVideo: A Deep Learning Library for Video Understanding.

[DOI]

,

,

,

Kalyan Vasudev Alwala

,

,

,

,

,

,

,

,

Ross B. Girshick

,

,

,

,

Christoph Feichtenhofer

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning.

[DOI]

Christoph Feichtenhofer

,

,

,

Ross B. Girshick

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Fast and Accurate Model Scaling.

[DOI]

,

,

Ross B. Girshick

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation.

[DOI]

,

Ross B. Girshick

,

,

Alexander C. Berg

,

Alexander Kirillov

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Improved Baselines with Momentum Contrastive Learning.

[DOI]

,

,

Ross B. Girshick

,

CoRR, 2020

Large Scale Weakly and Semi-Supervised Learning for Low-Resource Video ASR.

[DOI]

,

,

,

,

Ross B. Girshick

,

Vitaliy Liptchinsky

,

Christian Fuegen

,

,

,

Abdelrahman Mohamed

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Training ASR Models By Generation of Contextual Information.

[DOI]

,

,

,

,

,

Ross B. Girshick

,

,

,

,

,

Abdelrahman Mohamed

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Are Labels Necessary for Neural Architecture Search?

[DOI]

,

,

,

Ross B. Girshick

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

A Multigrid Method for Efficiently Training Video Models.

[DOI]

,

Ross B. Girshick

,

,

Christoph Feichtenhofer

,

Philipp Krähenbühl

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Designing Network Design Spaces.

[DOI]

Ilija Radosavovic

,

Raj Prateek Kosaraju

,

Ross B. Girshick

,

,

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

PointRend: Image Segmentation As Rendering.

[DOI]

Alexander Kirillov

,

,

,

Ross B. Girshick

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Momentum Contrast for Unsupervised Visual Representation Learning.

[DOI]

,

,

,

,

Ross B. Girshick

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

PHYRE: A New Benchmark for Physical Reasoning.

[DOI]

,

Laurens van der Maaten

,

,

Laura Gustafson

,

Ross B. Girshick

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Exploring Randomly Wired Neural Networks for Image Recognition.

[DOI]

,

Alexander Kirillov

,

Ross B. Girshick

,

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Rethinking ImageNet Pre-Training.

[DOI]

,

Ross B. Girshick

,

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

TensorMask: A Foundation for Dense Object Segmentation.

[DOI]

,

Ross B. Girshick

,

,

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Long-Term Feature Banks for Detailed Video Understanding.

[DOI]

,

Christoph Feichtenhofer

,

,

,

Philipp Krähenbühl

,

Ross B. Girshick

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Panoptic Segmentation.

[DOI]

Alexander Kirillov

,

,

Ross B. Girshick

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Panoptic Feature Pyramid Networks.

[DOI]

Alexander Kirillov

,

Ross B. Girshick

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

LVIS: A Dataset for Large Vocabulary Instance Segmentation.

[DOI]

,

,

Ross B. Girshick

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Exploring the Limits of Weakly Supervised Pretraining.

[DOI]

,

Ross B. Girshick

,

Vignesh Ramanathan

,

,

,

,

Ashwin Bharambe

,

Laurens van der Maaten

Proceedings of the Computer Vision - ECCV 2018, 2018

Low-Shot Learning From Imaginary Data.

[DOI]

,

Ross B. Girshick

,

,

Bharath Hariharan

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Data Distillation: Towards Omni-Supervised Learning.

[DOI]

Ilija Radosavovic

,

,

Ross B. Girshick

,

Georgia Gkioxari

,

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Learning by Asking Questions.

[DOI]

,

Ross B. Girshick

,

,

,

,

Laurens van der Maaten

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Learning to Segment Every Thing.

[DOI]

,

,

,

,

Ross B. Girshick

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Detecting and Recognizing Human-Object Interactions.

[DOI]

Georgia Gkioxari

,

Ross B. Girshick

,

,

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Non-Local Neural Networks.

[DOI]

,

Ross B. Girshick

,

,

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Object Detection Networks on Convolutional Feature Maps.

[DOI]

,

,

Ross B. Girshick

,

,

IEEE Trans. Pattern Anal. Mach. Intell., 2017

Object Instance Segmentation and Fine-Grained Localization Using Hypercolumns.

[DOI]

Bharath Hariharan

,

Pablo Arbeláez

,

Ross B. Girshick

,

IEEE Trans. Pattern Anal. Mach. Intell., 2017

Editorial- Deep Learning for Computer Vision.

[DOI]

Ross B. Girshick

,

Iasonas Kokkinos

,

,

,

George Papandreou

,

,

,

,

Comput. Vis. Image Underst., 2017

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour.

[DOI]

,

,

Ross B. Girshick

,

Pieter Noordhuis

,

Lukasz Wesolowski

,

,

,

,

CoRR, 2017

Focal Loss for Dense Object Detection.

[DOI]

,

,

Ross B. Girshick

,

,

Proceedings of the IEEE International Conference on Computer Vision, 2017

Inferring and Executing Programs for Visual Reasoning.

[DOI]

,

Bharath Hariharan

,

Laurens van der Maaten

,

,

,

C. Lawrence Zitnick

,

Ross B. Girshick

Proceedings of the IEEE International Conference on Computer Vision, 2017

Mask R-CNN.

[DOI]

,

Georgia Gkioxari

,

,

Ross B. Girshick

Proceedings of the IEEE International Conference on Computer Vision, 2017

Low-Shot Visual Recognition by Shrinking and Hallucinating Features.

[DOI]

Bharath Hariharan

,

Ross B. Girshick

Proceedings of the IEEE International Conference on Computer Vision, 2017

Aggregated Residual Transformations for Deep Neural Networks.

[DOI]

,

Ross B. Girshick

,

,

,

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Learning Features by Watching Objects Move.

[DOI]

,

Ross B. Girshick

,

,

,

Bharath Hariharan

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Feature Pyramid Networks for Object Detection.

[DOI]

,

,

Ross B. Girshick

,

,

Bharath Hariharan

,

Serge J. Belongie

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning.

[DOI]

,

Bharath Hariharan

,

Laurens van der Maaten

,

,

C. Lawrence Zitnick

,

Ross B. Girshick

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

The three R's of computer vision: Recognition, reconstruction and reorganization.

[DOI]

,

Pablo Andrés Arbeláez

,

,

Katerina Fragkiadaki

,

Ross B. Girshick

,

Georgia Gkioxari

,

,

Bharath Hariharan

,

,

Shubham Tulsiani

Pattern Recognit. Lett., 2016

Region-Based Convolutional Networks for Accurate Object Detection and Segmentation.

[DOI]

Ross B. Girshick

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., 2016

Low-shot visual object recognition.

[DOI]

Bharath Hariharan

,

Ross B. Girshick

CoRR, 2016

Reducing Overfitting in Deep Networks by Decorrelating Representations.

[DOI]

Michael Cogswell

,

,

Ross B. Girshick

,

,

Proceedings of the 4th International Conference on Learning Representations, 2016

Visual Storytelling.

[DOI]

Ting-Hao 'Kenneth' Huang

,

Francis Ferraro

,

Nasrin Mostafazadeh

,

,

Aishwarya Agrawal

,

,

Ross B. Girshick

,

,

,

,

C. Lawrence Zitnick

,

,

Lucy Vanderwende

,

,

Margaret Mitchell

Proceedings of the NAACL HLT 2016, 2016

Unsupervised Deep Embedding for Clustering Analysis.

[DOI]

,

Ross B. Girshick

,

Proceedings of the 33nd International Conference on Machine Learning, 2016

Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks.

[DOI]

,

Ross B. Girshick

,

Proceedings of the Computer Vision - ECCV 2016, 2016

Training Region-Based Object Detectors with Online Hard Example Mining.

[DOI]

Abhinav Shrivastava

,

,

Ross B. Girshick

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

You Only Look Once: Unified, Real-Time Object Detection.

[DOI]

,

Santosh Kumar Divvala

,

Ross B. Girshick

,

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels.

[DOI]

,

C. Lawrence Zitnick

,

Margaret Mitchell

,

Ross B. Girshick

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks.

[DOI]

,

C. Lawrence Zitnick

,

,

Ross B. Girshick

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Generalized Sparselet Models for Real-Time Multiclass Object Recognition.

[DOI]

,

Ross B. Girshick

,

,

Christopher Geyer

,

Pedro F. Felzenszwalb

,

IEEE Trans. Pattern Anal. Mach. Intell., 2015

Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation.

[DOI]

,

Pablo Andrés Arbeláez

,

Ross B. Girshick

,

Int. J. Comput. Vis., 2015

Learning Visual Classifiers using Human-centric Annotations.

[DOI]

,

C. Lawrence Zitnick

,

Margaret Mitchell

,

Ross B. Girshick

CoRR, 2015

Inferring 3D Object Pose in RGB-D Images.

[DOI]

,

Pablo Andrés Arbeláez

,

Ross B. Girshick

,

CoRR, 2015

Exploring Nearest Neighbor Approaches for Image Captioning.

[DOI]

,

,

Ross B. Girshick

,

Margaret Mitchell

,

C. Lawrence Zitnick

CoRR, 2015

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

[DOI]

,

,

Ross B. Girshick

,

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Actions and Attributes from Wholes and Parts.

[DOI]

Georgia Gkioxari

,

Ross B. Girshick

,

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Contextual Action Recognition with R*CNN.

[DOI]

Georgia Gkioxari

,

Ross B. Girshick

,

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Fast R-CNN.

[DOI]

Ross B. Girshick

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Hypercolumns for object segmentation and fine-grained localization.

[DOI]

Bharath Hariharan

,

Pablo Andrés Arbeláez

,

Ross B. Girshick

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Aligning 3D models to RGB-D images of cluttered scenes.

[DOI]

,

Pablo Andrés Arbeláez

,

Ross B. Girshick

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Deformable part models are convolutional neural networks.

[DOI]

Ross B. Girshick

,

Forrest N. Iandola

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014

One-Bit Object Detection: On learning to localize objects with minimal supervision.

[DOI]

,

Ross B. Girshick

,

Stefanie Jegelka

,

,

Zaïd Harchaoui

,

CoRR, 2014

Microsoft COCO: Common Objects in Context.

[DOI]

,

,

Serge J. Belongie

,

Lubomir D. Bourdev

,

Ross B. Girshick

,

,

,

,

,

C. Lawrence Zitnick

CoRR, 2014

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids.

[DOI]

Forrest N. Iandola

,

Matthew W. Moskewicz

,

,

Ross B. Girshick

,

,

CoRR, 2014

LSDA: Large Scale Detection Through Adaptation.

[DOI]

,

Sergio Guadarrama

,

,

,

Ross B. Girshick

,

,

CoRR, 2014

R-CNNs for Pose Estimation and Action Detection.

[DOI]

Georgia Gkioxari

,

Bharath Hariharan

,

Ross B. Girshick

,

CoRR, 2014

LSDA: Large Scale Detection through Adaptation.

[DOI]

,

Sergio Guadarrama

,

,

,

,

Ross B. Girshick

,

,

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Caffe: Convolutional Architecture for Fast Feature Embedding.

[DOI]

,

,

,

,

,

Ross B. Girshick

,

Sergio Guadarrama

,

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

On learning to localize objects with minimal supervision.

[DOI]

,

Ross B. Girshick

,

Stefanie Jegelka

,

,

Zaïd Harchaoui

,

Proceedings of the 31th International Conference on Machine Learning, 2014

Part-Based R-CNNs for Fine-Grained Category Detection.

[DOI]

,

,

Ross B. Girshick

,

Proceedings of the Computer Vision - ECCV 2014, 2014

Simultaneous Detection and Segmentation.

[DOI]

Bharath Hariharan

,

Pablo Andrés Arbeláez

,

Ross B. Girshick

,

Proceedings of the Computer Vision - ECCV 2014, 2014

Learning Rich Features from RGB-D Images for Object Detection and Segmentation.

[DOI]

,

Ross B. Girshick

,

Pablo Andrés Arbeláez

,

Proceedings of the Computer Vision - ECCV 2014, 2014

Analyzing the Performance of Multilayer Neural Networks for Object Recognition.

[DOI]

,

Ross B. Girshick

,

Proceedings of the Computer Vision - ECCV 2014, 2014

Understanding Objects in Detail with Fine-Grained Attributes.

[DOI]

,

Siddharth Mahendran

,

Stavros Tsogkas

,

,

Ross B. Girshick

,

,

,

Iasonas Kokkinos

,

Matthew B. Blaschko

,

,

,

,

,

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Using k-Poselets for Detecting People and Localizing Their Keypoints.

[DOI]

Georgia Gkioxari

,

Bharath Hariharan

,

Ross B. Girshick

,

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.

[DOI]

Ross B. Girshick

,

,

,

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013

Efficient Human Pose Estimation from Single Depth Images.

[DOI]

,

Ross B. Girshick

,

Andrew W. Fitzgibbon

,

,

,

,

,

,

Antonio Criminisi

,

,

IEEE Trans. Pattern Anal. Mach. Intell., 2013

Visual object detection with deformable part models.

[DOI]

Pedro F. Felzenszwalb

,

Ross B. Girshick

,

David A. McAllester

,

Commun. ACM, 2013

Discriminatively Activated Sparselets.

[DOI]

Ross B. Girshick

,

,

Proceedings of the 30th International Conference on Machine Learning, 2013

Training Deformable Part Models with Decorrelated Features.

[DOI]

Ross B. Girshick

,

Proceedings of the IEEE International Conference on Computer Vision, 2013

2012

Sparselet Models for Efficient Multiclass Object Detection.

[DOI]

,

,

,

Ross B. Girshick

,

,

Christopher Geyer

,

Pedro F. Felzenszwalb

,

Proceedings of the Computer Vision - ECCV 2012, 2012

2011

Object Detection with Grammar Models.

[DOI]

Ross B. Girshick

,

Pedro F. Felzenszwalb

,

David A. McAllester

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Efficient regression of general-activity human poses from depth images.

[DOI]

Ross B. Girshick

,

,

,

Antonio Criminisi

,

Andrew W. Fitzgibbon

Proceedings of the IEEE International Conference on Computer Vision, 2011

2010

Object Detection with Discriminatively Trained Part-Based Models.

[DOI]

Pedro F. Felzenszwalb

,

Ross B. Girshick

,

David A. McAllester

,

IEEE Trans. Pattern Anal. Mach. Intell., 2010

Discriminative Latent Variable Models for Object Detection.

[DOI]

Pedro F. Felzenszwalb

,

Ross B. Girshick

,

David A. McAllester

,

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Cascade object detection with deformable part models.

[DOI]

Pedro F. Felzenszwalb

,

Ross B. Girshick

,

David A. McAllester

Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009

Visibility constraints on features of 3D objects.

[DOI]

,

Pedro F. Felzenszwalb

,

Ross B. Girshick

,

David W. Jacobs

,

Caroline J. Klivans

Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

2004

Simulating Chinese brush painting: a geometric model.

[DOI]

Ross B. Girshick

J. Comput. Sci. Coll., 2004

Simulating Chinese brush painting: the parametric hairy brush.

[DOI]

Ross B. Girshick

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2004

Loading...