Antonio Torralba

Orcid: 0000-0003-4915-0256

Affiliations:
  • Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory


According to our database1, Antonio Torralba authored at least 311 papers between 1999 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
<i>Follow Anything:</i> Open-Set Detection, Tracking, and Following in Real-Time.
IEEE Robotics Autom. Lett., 2024

Efficient 3D Instance Mapping and Localization with Neural Fields.
CoRR, 2024

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis.
CoRR, 2024

GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment.
CoRR, 2024

MMToM-QA: Multimodal Theory of Mind Question Answering.
CoRR, 2024

A Vision Check-up for Language Models.
CoRR, 2024

2023
CAvatar: Real-time Human Activity Mesh Reconstruction via Tactile Carpets.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., December, 2023

Incidents1M: A Large-Scale Dataset of Images With Natural Disasters, Damage, and Incidents.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2023

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models.
CoRR, 2023

Customizing Motion in Text-to-Video Diffusion Models.
CoRR, 2023

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models.
CoRR, 2023

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning.
CoRR, 2023

A Function Interpretation Benchmark for Evaluating Interpretability Methods.
CoRR, 2023

Follow Anything: Open-set detection, tracking, and following in real-time.
CoRR, 2023

Multimodal Neurons in Pretrained Text-Only Transformers.
CoRR, 2023

Background Prompting for Improved Object Depth.
CoRR, 2023

Improving Factuality and Reasoning in Language Models through Multiagent Debate.
CoRR, 2023

ConceptFusion: Open-set Multimodal 3D Mapping.
CoRR, 2023

Debiasing Vision-Language Models via Biased Prompts.
CoRR, 2023


FIND: A Function Description Benchmark for Evaluating Interpretability Methods.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

NOPA: Neurally-guided Online Probabilistic Assistance for Building Socially Intelligent Home Assistants.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning.
Proceedings of the International Conference on Machine Learning, 2023

FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Composing Ensembles of Pre-trained Models via Iterative Consensus.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Multimodal Neurons in Pretrained Text-Only Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

BT<sup>2</sup>: Backward-compatible Training with Basis Transformation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DreamTeacher: Pretraining Image Backbones with Deep Generative Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Open-vocabulary Panoptic Segmentation with Embedding Modulation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Generalizing Dataset Distillation via Deep Generative Prior.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Detecting Everything in the Open World: Towards Universal Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Aliasing is a Driver of Adversarial Attacks.
CoRR, 2022

Local Relighting of Real Scenes.
CoRR, 2022

BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations.
CoRR, 2022

Correcting Robot Plans with Natural Language Feedback.
Proceedings of the Robotics: Science and Systems XVIII, New York City, NY, USA, June 27, 2022

Learning Neural Acoustic Fields.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Pre-Trained Language Models for Interactive Decision-Making.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Procedural Image Programs for Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Noisy Agents: Self-supervised Exploration by Predicting Auditory Events.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark Towards Physically Realistic Embodied AI.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Denoised MDPs: Learning World Models Better Than the World Itself.
Proceedings of the International Conference on Machine Learning, 2022

Natural Language Descriptions of Deep Visual Features.
Proceedings of the Tenth International Conference on Learning Representations, 2022

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos.
Proceedings of the Tenth International Conference on Learning Representations, 2022

MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Totems: Physical Objects for Verifying Visual Integrity.
Proceedings of the Computer Vision - ECCV 2022, 2022

Compositional Visual Generation with Composable Diffusion Models.
Proceedings of the Computer Vision - ECCV 2022, 2022

GAN-Supervised Dense Visual Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning Program Representations for Food Images and Cooking Recipes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Disentangling visual and written concepts in CLIP.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Virtual Correspondence: Humans as a Cue for Extreme-View Geometry.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022


Finding Fallen Objects Via Asynchronous Audio-Visual Integration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Robust Contrastive Learning against Noisy Views.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Dataset Distillation by Matching Training Trajectories.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Wearable ImageNet: Synthesizing Tileable Textures via Dataset Distillation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Skill Induction and Planning with Latent Language.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.
CoRR, 2021

Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales.
CoRR, 2021

The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI.
CoRR, 2021

Paint by Word.
CoRR, 2021

Editing a classifier by rewriting its prediction rules.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning to Compose Visual Relations.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

EditGAN: High-Precision Semantic Image Editing.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning to See by Looking at Noise.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021


Measuring Generalization with Optimal Transport.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dynamic Modeling of Hand-Object Interactions via Tactile Sensing.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

OPEn: An Open-ended Physics Environment for Learning Without a Task.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering.
Proceedings of the 9th International Conference on Learning Representations, 2021

Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration.
Proceedings of the 9th International Conference on Learning Representations, 2021

What You Can Learn by Staring at a Blank Wall.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Toward a Visual Concept Vocabulary for GAN Latent Space.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Scaling up instance annotation via label propagation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

BARF: Bundle-Adjusting Neural Radiance Fields.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

DatasetGAN: Efficient Labeled Data Factory With Minimal Human Effort.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Intelligent Carpet: Inferring 3D Human Pose From Tactile Signals.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

DriveGAN: Towards a Controllable High-Quality Neural Simulation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

3D Neural Scene Representations for Visuomotor Control.
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

2020
Understanding the role of individual units in a deep neural network.
Proc. Natl. Acad. Sci. USA, 2020

Using AI and Social Media Multimodal Content for Disaster Response and Management: Opportunities, Challenges, and Future Directions.
Inf. Process. Manag., 2020

Guest Editorial: Generative Adversarial Networks for Computer Vision.
Int. J. Comput. Vis., 2020

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input.
Int. J. Comput. Vis., 2020

Energy-Based Models for Continual Learning.
CoRR, 2020

LID 2020: The Learning from Imperfect Data Challenge Results.
CoRR, 2020

Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration.
CoRR, 2020

Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space.
CoRR, 2020

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
CoRR, 2020

Experiences and Insights for Collaborative Industry-Academic Research in Artificial Intelligence.
AI Mag., 2020

Causal Discovery in Physical Systems from Videos.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Debiased Contrastive Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Visual Grounding of Learned Physical Models.
Proceedings of the 37th International Conference on Machine Learning, 2020

Estimating Generalization under Distribution Shifts via Domain-Invariant Representations.
Proceedings of the 37th International Conference on Machine Learning, 2020

Deep Audio Priors Emerge From Harmonic Convolutional Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

CLEVRER: Collision Events for Video Representation and Reasoning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Learning Compositional Koopman Operators for Model-Based Control.
Proceedings of the 8th International Conference on Learning Representations, 2020

Detecting Natural Disasters, Damage, and Incidents in the Wild.
Proceedings of the Computer Vision - ECCV 2020, 2020

The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement.
Proceedings of the Computer Vision - ECCV 2020, 2020

Deep Feedback Inverse Problem Solver.
Proceedings of the Computer Vision - ECCV 2020, 2020

Foley Music: Learning to Generate Music from Videos.
Proceedings of the Computer Vision - ECCV 2020, 2020

Rewriting a Deep Generative Model.
Proceedings of the Computer Vision - ECCV 2020, 2020

Diverse Image Generation via Self-Conditioned GANs.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning to Simulate Dynamic Environments With GameGAN.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Music Gesture for Visual Sound Separation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Height and Uprightness Invariance for 3D Prediction From a Single View.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Comparing the Interpretability of Deep Networks via Network Dissection.
Proceedings of the Explainable AI: Interpreting, 2019

Semantic photo manipulation with a generative image prior.
ACM Trans. Graph., 2019

Interpreting Deep Visual Representations via Network Dissection.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

What Do Different Evaluation Metrics Tell Us About Saliency Models?
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Learning the signatures of the human grasp using a scalable tactile glove.
Nat., 2019

Semantic Understanding of Scenes Through the ADE20K Dataset.
Int. J. Comput. Vis., 2019

The Role of Embedding Complexity in Domain-invariant Representations.
CoRR, 2019

Visualizing and Understanding Generative Adversarial Networks (Extended Abstract).
CoRR, 2019

Propagation Networks for Model-Based Control Under Partial Observation.
Proceedings of the International Conference on Robotics and Automation, 2019

Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids.
Proceedings of the 7th International Conference on Learning Representations, 2019

Visualizing and Understanding GANs.
Proceedings of the Deep Generative Models for Highly Structured Data, 2019

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

Through-Wall Human Mesh Recovery Using Radio Signals.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

The Sound of Motions.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Gaze360: Physically Unconstrained Gaze Estimation in the Wild.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Meta-Sim: Learning to Generate Synthetic Datasets.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-Supervised Moving Vehicle Tracking With Stereo Sound.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Neural Turtle Graphics for Modeling City Road Layouts.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Seeing What a GAN Cannot Generate.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-supervised Audio-visual Co-segmentation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Words by Drawing Images.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Self-Supervised Segmentation and Source Separation on Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

How to Make a Pizza: Learning a Compositional Layer-Based GAN Model.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Synthesizing Environment-Aware Activities via Activity Sketches.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Connecting Touch and Vision via Cross-Modal Prediction.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Grounding Spoken Words in Unlabeled Video.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

2018
Exploiting Occlusion in Non-Line-of-Sight Active Imaging.
IEEE Trans. Computational Imaging, 2018

Places: A 10 Million Image Database for Scene Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Cross-Modal Scene Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

3D Interpreter Networks for Viewer-Centered Wireframe Modeling.
Int. J. Comput. Vis., 2018

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning.
Int. J. Comput. Vis., 2018

Visual Object Networks: Image Generation with Disentangled 3D Representation.
CoRR, 2018

Dataset Distillation.
CoRR, 2018

Revisiting the Importance of Individual Units in CNNs via Ablation.
CoRR, 2018

Using Computer Vision to Study the Effects of BMI on Online Popularity and Weight-Based Homophily.
Proceedings of the Social Informatics, 2018

RF-based 3D skeletons.
Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, 2018

Visual Object Networks: Image Generation with Disentangled 3D Representations.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

3D-Aware Scene Manipulation via Inverse Graphics.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

ShadowCam: Real-Time Detection of Moving Obstacles Behind A Corner For Autonomous Vehicles.
Proceedings of the 21st International Conference on Intelligent Transportation Systems, 2018

Real-Time Object Pose Estimation with Pose Interpreter Networks.
Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018

Interpretable Basis Decomposition for Visual Explanation.
Proceedings of the Computer Vision - ECCV 2018, 2018

Temporal Relational Reasoning in Videos.
Proceedings of the Computer Vision - ECCV 2018, 2018

The Sound of Pixels.
Proceedings of the Computer Vision - ECCV 2018, 2018

Learning to Zoom: A Saliency-Based Sampling Layer for Neural Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

Single Image Intrinsic Decomposition Without a Single Intrinsic Image.
Proceedings of the Computer Vision - ECCV 2018, 2018

Through-Wall Human Pose Estimation Using Radio Signals.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

VirtualHome: Simulating Household Activities via Programs.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Learning to Act Properly: Predicting and Explaining Affordances From Images.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Inferring Light Fields From Shadows.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Guest Editorial: Best of CVPR 2015.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

SLAC: A Sparsely Labeled Dataset for Action Classification and Localization.
CoRR, 2017

Temporal Relational Reasoning in Videos.
CoRR, 2017

See, Hear, and Read: Deep Aligned Representations.
CoRR, 2017

Is Saki #delicious?: The Food Perception Gap on Instagram and Its Relation to Health.
Proceedings of the 26th International Conference on World Wide Web, 2017

SegICP: Integrated deep semantic segmentation and pose estimation.
Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017

Face-to-BMI: Using Computer Vision to Infer Body Mass Index on Social Media.
Proceedings of the Eleventh International Conference on Web and Social Media, 2017

Learning to See by Hearing.
Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods, 2017

A Compositional Object-Based Approach to Learning Physical Dynamics.
Proceedings of the 5th International Conference on Learning Representations, 2017

Open Vocabulary Scene Parsing.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Following Gaze in Video.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Turning Corners into Cameras: Principles and Methods.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Scene Parsing through ADE20K Dataset.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Generating the Future with Adversarial Transformers.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Learning Cross-Modal Embeddings for Cooking Recipes and Food Images.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Network Dissection: Quantifying Interpretability of Deep Visual Representations.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Benchmarking Convolutional Neural Networks for Object Segmentation and Pose Estimation.
Proceedings of the 2017 IEEE Applied Imagery Pattern Recognition Workshop, 2017

2016
SUN Database: Exploring a Large Collection of Scene Categories.
Int. J. Comput. Vis., 2016

Visualizing Object Detection Features.
Int. J. Comput. Vis., 2016

Guest Editorial: Big Data.
Int. J. Comput. Vis., 2016

Semantic Understanding of Scenes through the ADE20K Dataset.
CoRR, 2016

Places: An Image Database for Deep Scene Understanding.
CoRR, 2016

Following Gaze Across Views.
CoRR, 2016

Who is Mistaken?
CoRR, 2016

Deep Neural Networks predict Hierarchical Spatio-temporal Cortical Dynamics of Human Visual Object Recognition.
CoRR, 2016

Generating Videos with Scene Dynamics.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Unsupervised Learning of Spoken Language with Visual Context.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

SoundNet: Learning Sound Representations from Unlabeled Video.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Ambient Sound Provides Supervision for Visual Learning.
Proceedings of the Computer Vision - ECCV 2016, 2016

Where Should Saliency Models Look Next?
Proceedings of the Computer Vision - ECCV 2016, 2016

Single Image 3D Interpreter Network.
Proceedings of the Computer Vision - ECCV 2016, 2016

Learning Deep Features for Discriminative Localization.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Anticipating Visual Representations from Unlabeled Video.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Predicting Motivations of Actions by Leveraging Text.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

MovieQA: Understanding Stories in Movies through Question-Answering.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Visually Indicated Sounds.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Eye Tracking for Everyone.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Learning Aligned Cross-Modal Representations from Weakly Aligned Data.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Object Detectors Emerge in Deep Scene CNNs.
Proceedings of the 3rd International Conference on Learning Representations, 2015

Understanding Intra-Class Knowledge Inside CNN.
CoRR, 2015

Anticipating the future by watching unlabeled video.
CoRR, 2015

Learning visual biases from human imagination.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Where are they looking?
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Skip-Thought Vectors.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Understanding and Predicting Image Memorability at a Large Scale.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

2014
What Makes a Photograph Memorable?
IEEE Trans. Pattern Anal. Mach. Intell., 2014

Accidental Pinhole and Pinspeck Cameras - Revealing the Scene Outside the Picture.
Int. J. Comput. Vis., 2014

Acquiring Visual Classifiers from Human Imagination.
CoRR, 2014

Inferring the Why in Images.
CoRR, 2014

Unsupervised Non-parametric Geospatial Modeling from Ground Imagery.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2014

Learning Deep Features for Scene Recognition using Places Database.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Exemplar Network: A Generalized Mixture Model.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Recognizing City Identity via Attribute Analysis of Geo-tagged Images.
Proceedings of the Computer Vision - ECCV 2014, 2014

Assessing the Quality of Actions.
Proceedings of the Computer Vision - ECCV 2014, 2014

FPM: Fine Pose Parts-Based Model with 3D CAD Models.
Proceedings of the Computer Vision - ECCV 2014, 2014

Looking Beyond the Visible Scene.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013
A boosting approach for the simultaneous detection and segmentation of generic objects.
Pattern Recognit. Lett., 2013

Learning with Hierarchical-Deep Models.
IEEE Trans. Pattern Anal. Mach. Intell., 2013

Are all training examples equally valuable?
CoRR, 2013

SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels.
Proceedings of the IEEE International Conference on Computer Vision, 2013

HOGgles: Visualizing Object Detection Features.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Shape Anchors for Data-Driven Multi-view Reconstruction.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Parsing IKEA Objects: Fine Pose Estimation.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Modifying the Memorability of Face Photographs.
Proceedings of the IEEE International Conference on Computer Vision, 2013

2012
Context models and out-of-context objects.
Pattern Recognit. Lett., 2012

A Tree-Based Context Model for Object Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2012

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model.
Proceedings of the Unsupervised and Transfer Learning, 2012

Inverting and Visualizing Features for Object Detection
CoRR, 2012

Notes on image annotation
CoRR, 2012

Basic level scene understanding: from labels to structure and beyond.
Proceedings of the SIGGRAPH Asia 2012 Technical Briefs, Singapore, November 28, 2012

Image memorability and visual inception.
Proceedings of the SIGGRAPH Asia 2012 Technical Briefs, Singapore, November 28, 2012

Localizing 3D cuboids in single-view images.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Memorability of Image Regions.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

A Latent Variable Ranking Model for Content-Based Retrieval.
Proceedings of the Advances in Information Retrieval, 2012

Multidimensional Spectral Hashing.
Proceedings of the Computer Vision - ECCV 2012, 2012

Undoing the Damage of Dataset Bias.
Proceedings of the Computer Vision - ECCV 2012, 2012

Recognizing scene viewpoint using panoramic place representation.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011
Nonparametric Scene Parsing via Label Transfer.
IEEE Trans. Pattern Anal. Mach. Intell., 2011

SIFT Flow: Dense Correspondence across Scenes and Its Applications.
IEEE Trans. Pattern Anal. Mach. Intell., 2011

Learning to Learn with Compound HD Models.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Transfer Learning by Borrowing Examples for Multiclass Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Understanding the Intrinsic Memorability of Images.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Simultaneous detection and segmentation for generic objects.
Proceedings of the 18th IEEE International Conference on Image Processing, 2011

Evaluation of image features using a photorealistic virtual world.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Unbiased look at dataset bias.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

Learning to share visual appearance for multiclass object detection.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011


What makes an image memorable?
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

Estimating scene typicality from human ratings and image features.
Proceedings of the 33th Annual Meeting of the Cognitive Science Society, 2011

AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video.
Proceedings of the 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance, 2011

2010
LabelMe: Online Image Annotation and Applications.
Proc. IEEE, 2010

Infinite Images: Creating and Exploring a Large Photorealistic Virtual Space.
Proc. IEEE, 2010

Using the forest to see the trees: exploiting context for visual object detection and localization.
Commun. ACM, 2010

A Data-Driven Approach for Event Prediction.
Proceedings of the Computer Vision, 2010

Modeling and Analysis of Dynamic Behaviors of Web Image Collections.
Proceedings of the Computer Vision - ECCV 2010, 2010

Semantic Label Sharing for Learning with Many Categories.
Proceedings of the Computer Vision, 2010

Part and appearance sharing: Recursive Compositional Models for multi-view.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

SUN database: Large-scale scene recognition from abbey to zoo.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

Exploiting hierarchical context on a large database of object categories.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009
Guest Editors' Introduction to the Special Section on Probabilistic Graphical Models.
IEEE Trans. Pattern Anal. Mach. Intell., 2009

Nonparametric Bayesian Texture Learning and Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Unsupervised Detection of Regions of Interest Using Iterative Link Analysis.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Semi-Supervised Learning in Gigantic Image Collections.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

LabelMe video: Building a video database with human annotations.
Proceedings of the IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27, 2009

Learning to predict where humans look.
Proceedings of the IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27, 2009

Building a database of 3D scenes from user annotations.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Recognizing indoor scenes.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Nonparametric scene parsing: Label transfer via dense scene alignment.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

2008
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2008

Describing Visual Scenes Using Transformed Objects and Parts.
Int. J. Comput. Vis., 2008

LabelMe: A Database and Web-Based Tool for Image Annotation.
Int. J. Comput. Vis., 2008

Spectral Hashing.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

SIFT Flow: Dense Correspondence across Different Scenes.
Proceedings of the Computer Vision, 2008

Small codes and large image databases for recognition.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

Creating and exploring a large photorealistic virtual space.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008

2007
Sharing Visual Features for Multiclass and Multiview Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2007

Object Recognition by Scene Alignment.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

2006
Hybrid images.
ACM Trans. Graph., 2006

Depth from Familiar Objects: A Hierarchical Model for 3D Scenes.
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 2006

Shared Features for Multiclass Object Detection.
Proceedings of the Toward Category-Level Object Recognition, 2006


Object Detection and Localization Using Local and Global Features.
Proceedings of the Toward Category-Level Object Recognition, 2006

2005
Motion magnification.
ACM Trans. Graph., 2005

Describing Visual Scenes using Transformed Dirichlet Processes.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Learning Hierarchical Models of Scenes, Objects, and Parts.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

An Ensemble Prior of Image Structure for Cross-Modal Inference.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

Human Learning of Contextual Priors for Object Search: Where does the time go?
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005

2004
Contextual Models for Object Detection Using Boosted Random Fields.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Sharing Features: Efficient Boosting Procedures for Multiclass Object Detection.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June, 2004

2003
Contextual Priming for Object Detection.
Int. J. Comput. Vis., 2003

Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Top-down control of visual attention in object detection.
Proceedings of the 2003 International Conference on Image Processing, 2003

Context-based vision system for place and object recognition.
Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV 2003), 2003

Properties and Applications of Shape Recipes.
Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), 2003

2002
Depth Estimation from Image Structure.
IEEE Trans. Pattern Anal. Mach. Intell., 2002

Shape Recipes: Scene Representations that Refer to the Image.
Proceedings of the Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, 2002

Scene-Centered Description from Spatial Envelope Properties.
Proceedings of the Biologically Motivated Computer Vision Second International Workshop, 2002

2001
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope.
Int. J. Comput. Vis., 2001

Contextual Modulation of Target Saliency.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

Statistical Context Priming for Object Detection.
Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7-14, 2001, 2001

1999
Semantic Organization of Scenes using Discriminant Structural Templates.
Proceedings of the International Conference on Computer Vision, 1999


  Loading...