Trevor Darrell

Orcid: 0000-0001-5453-8533

Affiliations:
  • University of California, Berkeley, USA


According to our database1, Trevor Darrell authored at least 540 papers between 1987 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Humanoid Locomotion as Next Token Prediction.
CoRR, 2024

Neural Network Diffusion.
CoRR, 2024

InstanceDiffusion: Instance-level Control for Image Generation.
CoRR, 2024

Rethinking Patch Dependence for Masked Autoencoders.
CoRR, 2024

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations.
CoRR, 2024

2023
QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Exploring Simple and Transferable Recognition-Aware Image Processing.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

Monocular Quasi-Dense 3D Object Tracking.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

Unsupervised Universal Image Segmentation.
CoRR, 2023

See, Say, and Segment: Teaching LMMs to Overcome False Premises.
CoRR, 2023

Describing Differences in Image Sets with Natural Language.
CoRR, 2023

Recursive Visual Programming.
CoRR, 2023

Readout Guidance: Learning Control from Diffusion Features.
CoRR, 2023

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks.
CoRR, 2023

Sequential Modeling Enables Scalable Learning for Large Vision Models.
CoRR, 2023

Initializing Models with Larger Ones.
CoRR, 2023

Object-based (yet Class-agnostic) Video Domain Adaptation.
CoRR, 2023

Compositional Chain-of-Thought Prompting for Large Multimodal Models.
CoRR, 2023

Self-correcting LLM-controlled Diffusion Models.
CoRR, 2023

Comparative Multi-View Language Grounding.
CoRR, 2023

A Coefficient Makes SVRG Effective.
CoRR, 2023

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game.
CoRR, 2023

LLM-grounded Video Diffusion Models.
CoRR, 2023

Aligning Large Multimodal Models with Factually Augmented RLHF.
CoRR, 2023

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation.
CoRR, 2023

Predicting masked tokens in stochastic locations improves masked image modeling.
CoRR, 2023

Neural Relighting with Subsurface Scattering by Learning the Radiance Transfer Gradient.
CoRR, 2023

Refocusing Is Key to Transfer Learning.
CoRR, 2023

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts.
CoRR, 2023

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models.
CoRR, 2023

Simple Token-Level Confidence Improves Caption Correctness.
CoRR, 2023

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models.
CoRR, 2023

Learning and Verification of Task Structure in Instructional Videos.
CoRR, 2023

Learning Humanoid Locomotion with Transformers.
CoRR, 2023

More Control for Free! Image Synthesis with Semantic Diffusion Guidance.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Large Language Models are Visual Reasoning Coordinators.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Hierarchical Open-vocabulary Universal Image Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Guiding Pretraining in Reinforcement Learning with Large Language Models.
Proceedings of the International Conference on Machine Learning, 2023

Dropout Reduces Underfitting.
Proceedings of the International Conference on Machine Learning, 2023

Using Language to Extend to Unseen Domains.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Revisiting Generalizability in Deepfake Detection: Improving Metrics and Stabilizing Transfer.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Can Language Models Learn to Listen?
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scaling Vision-Language Models with Sparse Mixture of Experts.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CLAIR: Evaluating Image Captions with Large Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Top-Down Visual Attention from Analysis by Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Robot Learning with Sensorimotor Pre-training.
Proceedings of the Conference on Robot Learning, 2023

Modular Visual Question Answering via Code Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

2022
Does unsupervised grammar induction need pixels?
CoRR, 2022

PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data.
CoRR, 2022

Shape-Guided Diffusion with Inside-Outside Attention.
CoRR, 2022

G^3: Geolocation via Guidebook Grounding.
CoRR, 2022

Multitask Vision-Language Prompt Tuning.
CoRR, 2022

Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset.
CoRR, 2022

Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers.
CoRR, 2022

Refine and Represent: Region-to-Object Representation Learning.
CoRR, 2022

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency.
CoRR, 2022

Back to the Source: Diffusion-Driven Test-Time Adaptation.
CoRR, 2022

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022.
CoRR, 2022

Masked Visual Pre-training for Motor Control.
CoRR, 2022

Explaining Reinforcement Learning Policies through Counterfactual Trajectories.
CoRR, 2022

Self-Supervised Pretraining Improves Self-Supervised Pretraining.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Strumming to the Beat: Audio-Conditioned Contrastive Video Textures.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

K-LITE: Learning Transferable Visual Models with External Knowledge.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Visual Prompting via Image Inpainting.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Exposing the Limits of Video-Text Models through Contrast Sets.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Disentangled Action Recognition with Knowledge Bases.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Towards Learning to Play Piano with Dexterous Hands and Touch.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

Visual Attention Emerges from Recurrent Sparse Reconstruction.
Proceedings of the International Conference on Machine Learning, 2022

Zero-Shot Reward Specification via Grounded Natural Language.
Proceedings of the International Conference on Machine Learning, 2022

Differentiable Gradient Sampling for Learning Implicit 3D Scene Reconstructions from a Single Image.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Anytime Dense Prediction with Confidence Adaptivity.
Proceedings of the Tenth International Conference on Learning Representations, 2022

G3: Geolocation via Guidebook Grounding.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learning to Detect Every Thing in an Open World.
Proceedings of the Computer Vision, 2022

TL;DW? Summarizing Instructional Videos with Task Relevance and Cross-Modal Saliency.
Proceedings of the Computer Vision - ECCV 2022, 2022

Studying Bias in GANs Through the Lens of Race.
Proceedings of the Computer Vision - ECCV 2022, 2022

On Guiding Visual Attention with Language Specification.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Object-Region Video Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DETReg: Unsupervised Pretraining with Region Priors for Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

A ConvNet for the 2020s.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Contrastive Test-Time Adaptation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Real-World Robot Learning with Masked Visual Pre-training.
Proceedings of the Conference on Robot Learning, 2022

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Voxel-informed Language Grounding.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2022

Un-mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
On-target Adaptation.
CoRR, 2021

Region-level Active Learning for Cluttered Scenes.
CoRR, 2021

Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks.
CoRR, 2021

Confidence Adaptive Anytime Pixel-Level Recognition.
CoRR, 2021

Early Convolutions Help Transformers See Better.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Teachable Reinforcement Learning via Advice Distillation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Benchmark for Compositional Text-to-Image Synthesis.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

CLIP-It! Language-Guided Video Summarization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Modular Networks for Compositional Instruction Following.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data.
Proceedings of Machine Learning and Systems 2021, 2021

Zero-shot Policy Learning with Spatial Temporal Reward Decomposition on Contingency-aware Observation.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

PyTouch: A Machine Learning Library for Touch Processing.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Auto-Tuned Sim-to-Real Transfer.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Instance-Aware Predictive Navigation in Multi-Agent Environments.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Compositional Video Synthesis with Action Graphs.
Proceedings of the 38th International Conference on Machine Learning, 2021

What Should Not Be Contrastive in Contrastive Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Tent: Fully Test-Time Adaptation by Entropy Minimization.
Proceedings of the 9th International Conference on Learning Representations, 2021

Discovering Non-monotonic Autoregressive Orderings with Variational Inference.
Proceedings of the 9th International Conference on Learning Representations, 2021

Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting.
Proceedings of the 9th International Conference on Learning Representations, 2021

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control.
Proceedings of the 9th International Conference on Learning Representations, 2021

Region Similarity Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Temporal Action Detection with Multi-level Supervision.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Rethinking preventing class-collapsing in metric learning with margin-based losses.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Predicting with Confidence on Unseen Distributions.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Robust Object Detection via Instance-Level Temporal Cycle Confusion.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Quasi-Dense Similarity Learning for Multiple Object Tracking.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Body2Hands: Learning To Infer 3D Hands From Conversational Gesture Body Dynamics.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
The Whole Is More Than Its Parts? From Explicit to Implicit Pose Normalization.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Compositional GAN: Learning Image-Conditional Binary Composition.
Int. J. Comput. Vis., 2020

Minimax Active Learning.
CoRR, 2020

Modularity Improves Out-of-Domain Instruction Following.
CoRR, 2020

Evaluating Self-Supervised Pretraining Without Using Labels.
CoRR, 2020

Compositional Video Synthesis with Action Graphs.
CoRR, 2020

Fully Test-time Adaptation by Entropy Minimization.
CoRR, 2020

Quasi-Dense Instance Similarity Learning.
CoRR, 2020

Reducing Class Collapse in Metric Learning with Easy Positive Sampling.
CoRR, 2020

Contrastive Examples for Addressing the Tyranny of the Majority.
CoRR, 2020

Spatio-Temporal Action Detection with Multi-Object Interaction.
CoRR, 2020

Revisiting Few-shot Activity Detection with Class Similarity Control.
CoRR, 2020

Rethinking Image Mixture for Unsupervised Visual Representation Learning.
CoRR, 2020

A New Meta-Baseline for Few-Shot Learning.
CoRR, 2020

Fighting Copycat Agents in Behavioral Cloning from Observation Histories.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Auxiliary Task Reweighting for Minimum-data Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

ParkPredict: Motion and Intent Prediction of Vehicles in Parking Lots.
Proceedings of the IEEE Intelligent Vehicles Symposium, 2020

Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Video Prediction via Example Guidance.
Proceedings of the 37th International Conference on Machine Learning, 2020

Frustratingly Simple Few-Shot Object Detection.
Proceedings of the 37th International Conference on Machine Learning, 2020

Uncertainty-guided Continual Learning with Bayesian Neural Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

Hierarchical Style-Based Networks for Motion Synthesis.
Proceedings of the Computer Vision - ECCV 2020, 2020

Identity-Aware Multi-sentence Video Description.
Proceedings of the Computer Vision - ECCV 2020, 2020

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning Canonical Representations for Scene Graph to Image Generation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Adversarial Continual Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning Saliency Propagation for Semi-Supervised Instance Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Semantic Bottleneck Scene Generation.
CoRR, 2019

Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control.
CoRR, 2019

Regularization Matters in Policy Optimization.
CoRR, 2019

Transferable Recognition-Aware Image Processing.
CoRR, 2019

Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions and sparse rewards for zero-shot generalization.
CoRR, 2019

Unsupervised Domain Adaptation through Self-Supervision.
CoRR, 2019

Dynamic Scale Inference by Entropy Minimization.
CoRR, 2019

Task-Aware Deep Sampling for Feature Generation.
CoRR, 2019

Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields.
CoRR, 2019

Viewpoint Invariant Change Captioning.
CoRR, 2019

LabelAR: A Spatial Guidance Interface for Fast Computer Vision Image Collection.
Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, 2019

Deep Mixture of Experts via Shallow Embedding.
Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Compositional Plan Vectors.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Monocular Plan View Networks for Autonomous Driving.
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

Deep Object-Centric Policies for Autonomous Driving.
Proceedings of the International Conference on Robotics and Automation, 2019

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees.
Proceedings of the 7th International Conference on Learning Representations, 2019

Rethinking the Value of Network Pruning.
Proceedings of the 7th International Conference on Learning Representations, 2019

Large-Scale Study of Curiosity-Driven Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

Compositional GAN (Extended Abstract): Learning Image-Conditional Binary Composition.
Proceedings of the Deep Generative Models for Highly Structured Data, 2019

Discriminator Rejection Sampling.
Proceedings of the 7th International Conference on Learning Representations, 2019

Spatio-Temporal Action Graph Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Variational Adversarial Active Learning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Semi-Supervised Domain Adaptation via Minimax Entropy.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Robust Change Captioning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Few-Shot Object Detection via Feature Reweighting.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Language-Conditioned Graph Networks for Relational Reasoning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Joint Monocular 3D Vehicle Detection and Tracking.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Disentangling Propagation and Generation for Video Prediction.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Hierarchical Discrete Distribution Decomposition for Match Density Estimation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Generalized Zero-Shot Learning via Aligned Variational Autoencoders.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Adversarial Inference for Multi-Sentence Video Description.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Uncertainty-Guided Continual Learning in Bayesian Neural Networks - Extended Abstract.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Accurate Visual Localization for Automotive Applications.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Guest Editors' Introduction to the Special Section on Learning with Shared Information for Computer Vision and Multimedia Analysis.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Similarity R-C3D for Few-shot Temporal Activity Detection.
CoRR, 2018

Classifying Collisions with Spatio-Temporal Action Graph Networks.
CoRR, 2018

SPLAT: Semantic Pixel-Level Adaptation Transforms for Detection.
CoRR, 2018

Compositional GAN: Learning Conditional Image Composition.
CoRR, 2018

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees.
CoRR, 2018

Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract).
CoRR, 2018

Generating Counterfactual Explanations with Natural Language.
CoRR, 2018

Few-Shot Segmentation Propagation with Guided Networks.
CoRR, 2018

Deep Mixture of Experts via Shallow Embedding.
CoRR, 2018

BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling.
CoRR, 2018

Women also Snowboard: Overcoming Bias in Captioning Models.
CoRR, 2018

Speaker-Follower Models for Vision-and-Language Navigation.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Deep Object-Centric Representations for Generalizable Robot Learning.
Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

CyCADA: Cycle-Consistent Adversarial Domain Adaptation.
Proceedings of the 35th International Conference on Machine Learning, 2018

Learning Rich Image Representation with Deep Layer Aggregation.
Proceedings of the 6th International Conference on Learning Representations, 2018

Conditional Networks for Few-Shot Semantic Segmentation.
Proceedings of the 6th International Conference on Learning Representations, 2018

Recasting Gradient-Based Meta-Learning as Hierarchical Bayes.
Proceedings of the 6th International Conference on Learning Representations, 2018

Reinforcement Learning from Imperfect Demonstrations.
Proceedings of the 6th International Conference on Learning Representations, 2018

Adapting to Continuously Shifting Domains.
Proceedings of the 6th International Conference on Learning Representations, 2018

Object Hallucination in Image Captioning.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Localizing Moments in Video with Temporal Language.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

SkipNet: Learning Dynamic Routing in Convolutional Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

Textual Explanations for Self-Driving Vehicles.
Proceedings of the Computer Vision - ECCV 2018, 2018

Explainable Neural Computation via Stack Neural Module Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

Grounding Visual Explanations.
Proceedings of the Computer Vision - ECCV 2018, 2018

Women Also Snowboard: Overcoming Bias in Captioning Models.
Proceedings of the Computer Vision - ECCV 2018, 2018

Deep Layer Aggregation.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Fooling Vision and Language Models Despite Localization and Attention Mechanism.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Learning Instance Segmentation by Interaction.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Zero-Shot Visual Imitation.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Learning to Segment Every Thing.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Multi-Content GAN for Few-Shot Font Style Transfer.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Modular Architecture for StarCraft II with Deep Reinforcement Learning.
Proceedings of the Fourteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 2018

2017
Simultaneous Deep Transfer Across Domains and Tasks.
Proceedings of the Domain Adaptation in Computer Vision Applications., 2017

Fully Convolutional Networks for Semantic Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract).
CoRR, 2017

Grounding Visual Explanations (Extended Abstract).
CoRR, 2017

Can you fool AI with adversarial examples on a visual Turing test?
CoRR, 2017

Deep Layer Aggregation.
CoRR, 2017

Visual Discovery at Pinterest.
Proceedings of the 26th International Conference on World Wide Web Companion, 2017

Toward Multimodal Image-to-Image Translation.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Learning modular neural network policies for multi-task and multi-robot transfer.
Proceedings of the 2017 IEEE International Conference on Robotics and Automation, 2017

Adversarial Discriminative Domain Adaptation (workshop extended abstract).
Proceedings of the 5th International Conference on Learning Representations, 2017

Loss is its own Reward: Self-Supervision for Reinforcement Learning.
Proceedings of the 5th International Conference on Learning Representations, 2017

Adversarial Feature Learning.
Proceedings of the 5th International Conference on Learning Representations, 2017

Generalized Orderless Pooling Performs Implicit Salient Matching.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning to Reason: End-to-End Module Networks for Visual Question Answering.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Localizing Moments in Video with Natural Language.
Proceedings of the IEEE International Conference on Computer Vision, 2017

End-to-End Learning of Driving Models from Large-Scale Video Datasets.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Captioning Images with Diverse Objects.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Adversarial Discriminative Domain Adaptation.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Learning Features by Watching Objects Move.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Curiosity-Driven Exploration by Self-Supervised Prediction.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

Modeling Relationships in Referential Expressions with Compositional Modular Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Learning Detection with Diverse Proposals.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Gradient-free Policy Architecture Search and Adaptation.
Proceedings of the 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, 2017

2016
Learning to Detect Visual Grasp Affordance.
IEEE Trans Autom. Sci. Eng., 2016

Corrigendum to "Robotic learning of haptic adjectives through physical interaction" [Robot. Auton. Syst. 63 (P3) (2015) 279-292].
Robotics Auton. Syst., 2016

Region-Based Convolutional Networks for Accurate Object Detection and Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

End-to-End Training of Deep Visuomotor Policies.
J. Mach. Learn. Res., 2016

Large Scale Visual Recognition through Adaptation using Joint Representation and Multiple Instance Learning.
J. Mach. Learn. Res., 2016

Understanding object descriptions in robotics by open-vocabulary object retrieval and detection.
Int. J. Robotics Res., 2016

Attentive Explanations: Justifying Decisions and Pointing to the Evidence.
CoRR, 2016

Data-dependent Initializations of Convolutional Neural Networks.
Proceedings of the 4th International Conference on Learning Representations, 2016

Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions.
CoRR, 2016

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation.
CoRR, 2016

Auxiliary Image Regularization for Deep CNNs with Noisy Labels.
Proceedings of the 4th International Conference on Learning Representations, 2016

Adapting Deep Visuomotor Representations with Weak Pairwise Constraints.
Proceedings of the Algorithmic Foundations of Robotics XII, 2016

Learning to Compose Neural Networks for Question Answering.
Proceedings of the NAACL HLT 2016, 2016

Proton: A visuo-haptic data acquisition system for robotic learning of surface properties.
Proceedings of the 2016 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2016

TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning.
Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016

Cross-modal adaptation for RGB-D detection.
Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016

Deep learning for tactile understanding from visual and haptic data.
Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016

Deep spatial autoencoders for visuomotor learning.
Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Clockwork Convnets for Video Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

Grounding of Textual Phrases in Images by Reconstruction.
Proceedings of the Computer Vision - ECCV 2016, 2016

Segmentation from Natural Language Expressions.
Proceedings of the Computer Vision - ECCV 2016, 2016

Generating Visual Explanations.
Proceedings of the Computer Vision - ECCV 2016, 2016

Best Practices for Fine-Tuning Visual Classifiers to New Domains.
Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

Context Encoders: Feature Learning by Inpainting.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Natural Language Object Retrieval.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Learning with Side Information through Modality Hallucination.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Compact Bilinear Pooling.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Neural Module Networks.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Robotic learning of haptic adjectives through physical interaction.
Robotics Auton. Syst., 2015

Generalized Sparselet Models for Real-Time Multiclass Object Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

Machine Learning with Interdependent and Non-identically Distributed Data (Dagstuhl Seminar 15152).
Dagstuhl Reports, 2015

Introduction to the CVIU special issue on "Parts and Attributes: Mid-level representation for object recognition, scene classification and object detection".
Comput. Vis. Image Underst., 2015

Fine-grained pose prediction, normalization, and recognition.
CoRR, 2015

Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments.
CoRR, 2015

Fully Convolutional Multi-Class Multiple Instance Learning.
Proceedings of the 3rd International Conference on Learning Representations, 2015

Constrained Structured Regression with Convolutional Neural Networks.
CoRR, 2015

Mapping Images to Sentiment Adjective Noun Pairs with Factorized Neural Nets.
CoRR, 2015

Learning Visual Feature Spaces for Robotic Manipulation with Deep Spatial Autoencoders.
CoRR, 2015

Learning Compact Convolutional Neural Networks with Nested Dropout.
Proceedings of the 3rd International Conference on Learning Representations, 2015

Quantification in-the-wild: data-sets and baselines.
CoRR, 2015

Deep Compositional Question Answering with Neural Module Networks.
CoRR, 2015

Scene Intrinsics and Depth from a Single Image.
Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop, 2015

Sequence to Sequence - Video to Text.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Simultaneous Deep Transfer Across Domains and Tasks.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Spatial Semantic Regularisation for Large Scale Object Detection.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Learning the Structure of Deep Convolutional Networks.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Detector discovery in the wild: Joint multiple instance and representation learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Deformable part models are convolutional neural networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Modeling Radiometric Uncertainty for Vision with Tone-Mapped Color Images.
IEEE Trans. Pattern Anal. Mach. Intell., 2014

Guest Editor's Introduction to the Special Issue on Domain Adaptation for Vision Applications.
Int. J. Comput. Vis., 2014

Deep Domain Confusion: Maximizing for Domain Invariance.
CoRR, 2014

One-Bit Object Detection: On learning to localize objects with minimal supervision.
CoRR, 2014

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids.
CoRR, 2014

One-Shot Adaptation of Supervised Deep Convolutional Models.
Proceedings of the 2nd International Conference on Learning Representations, 2014

LSDA: Large Scale Detection Through Adaptation.
CoRR, 2014

DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks.
CoRR, 2014

Open-vocabulary Object Retrieval.
Proceedings of the Robotics: Science and Systems X, 2014

Weakly-supervised Discovery of Visual Pattern Configurations.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Do Convnets Learn Correspondence?
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

LSDA: Large Scale Detection through Adaptation.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Caffe: Convolutional Architecture for Fast Feature Embedding.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Interactive adaptation of real-time object detectors.
Proceedings of the 2014 IEEE International Conference on Robotics and Automation, 2014

On learning to localize objects with minimal supervision.
Proceedings of the 31th International Conference on Machine Learning, 2014

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition.
Proceedings of the 31th International Conference on Machine Learning, 2014

Part-Based R-CNNs for Fine-Grained Category Detection.
Proceedings of the Computer Vision - ECCV 2014, 2014

Exemplar-Specific Patch Features for Fine-Grained Recognition.
Proceedings of the Pattern Recognition - 36th German Conference, 2014

PANDA: Pose Aligned Networks for Deep Attribute Modeling.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Anytime Recognition of Objects and Scenes.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Continuous Manifold Based Adaptation for Evolving Visual Domains.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Learning Scalable Discriminative Dictionary with Sample Relatedness.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Recognizing Image Style.
Proceedings of the British Machine Vision Conference, 2014

2013
A Category-Level 3D Object Dataset: Putting the Kinect to Work.
Proceedings of the Consumer Depth Cameras for Computer Vision, 2013

Pooling-Invariant Image Feature Learning
CoRR, 2013

Why Size Matters: Feature Coding as Nystrom Sampling
Proceedings of the 1st International Conference on Learning Representations, 2013

Efficient Learning of Domain-invariant Image Representations
Proceedings of the 1st International Conference on Learning Representations, 2013

Towards Adapting ImageNet to Reality: Scalable Domain Adaptation with Implicit Low-rank Transformations.
CoRR, 2013

Recognizing Image Style.
CoRR, 2013

Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Grounding spatial relations for human-robot interaction.
Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013

Using robotic exploratory procedures to learn the meaning of haptic adjectives.
Proceedings of the 2013 IEEE International Conference on Robotics and Automation, 2013

On Compact Codes for Spatially Pooled Features.
Proceedings of the 30th International Conference on Machine Learning, 2013

Discriminatively Activated Sparselets.
Proceedings of the 30th International Conference on Machine Learning, 2013

Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Latent Task Adaptation with Large-Scale Hierarchies.
Proceedings of the IEEE International Conference on Computer Vision, 2013

YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Semi-supervised Domain Adaptation with Instance Constraints.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012
A geometric approach to robotic laundry folding.
Int. J. Robotics Res., 2012

Multi-View Learning in the Presence of View Disagreement
CoRR, 2012

Factorized Multi-Modal Topic Model.
Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012

SRI-Sarnoff AURORA System at TRECVID 2012 Multimedia Event Detection and Recounting.
Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Learning with Recursive Perceptual Representations.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Timely Object Recognition.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Detection bank: an object detection based video representation for multimedia event recognition.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Sparselet Models for Efficient Multiclass Object Detection.
Proceedings of the Computer Vision - ECCV 2012, 2012

Discovering Latent Domains for Multisource Domain Adaptation.
Proceedings of the Computer Vision - ECCV 2012, 2012

Pose pooling kernels for sub-category recognition.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

From pixels to physics: Probabilistic color de-rendering.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

Beyond spatial pyramids: Receptive field learning for pooled image features.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011

Heavy-tailed Distances for Gradient Based Image Descriptors.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Perception for the manipulation of socks.
Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011

Practical 3-D object detection using category and instance-level appearance models.
Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011

Parametrized shape models for clothing.
Proceedings of the IEEE International Conference on Robotics and Automation, 2011

Visual grasp affordances from appearance-based cues.
Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011

A category-level 3-D object dataset: Putting the Kinect to work.
Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011

The NBNN kernel.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Learning cross-modality similarity for multinomial data.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Supervised hierarchical Pitman-Yor process for natural scene segmentation.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

Learning object color models from multi-view constraints.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

What you saw is not what you get: Domain adaptation using asymmetric kernel transforms.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

A probabilistic model for recursive factorized image features.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

2010
Toward Large-Scale Face Recognition Using Social Network Context.
Proc. IEEE, 2010

Factorized Orthogonal Latent Spaces.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Gaussian Processes for Object Categorization.
Int. J. Comput. Vis., 2010

Factorized Latent Spaces with Structured Sparsity.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Size Matters: Metric Visual Search Constraints from Monocular Metadata.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Multimodal location estimation.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Adapting Visual Category Models to New Domains.
Proceedings of the Computer Vision, 2010

Learning to Recognize Objects from Unseen Modalities.
Proceedings of the Computer Vision, 2010

2009
Multistream Articulatory Feature-Based Models for Visual Speech Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2009

Filtering Abstract Senses From Image Search Results.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Learning to Hash with Binary Reconstructive Embeddings.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

An Additive Latent Feature Model for Transparent Object Recognition.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Invited talk: image recognition for intelligent interfaces.
Proceedings of the 14th International Conference on Intelligent User Interfaces, 2009

An efficient projection for <i>l</i><sub>1</sub>,<sub>infinity</sub> regularization.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Multiple-view object recognition in band-limited distributed camera networks.
Proceedings of the Third ACM/IEEE International Conference on Distributed Smart Cameras, 2009

Who is "You"? Combining Linguistic and Gaze Features to Resolve Second-Person References in Dialogue.
Proceedings of the EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30, 2009

Fast concurrent object localization and recognition.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Rank priors for continuous non-linear dimensionality reduction.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Co-training with noisy perceptual observations.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

2008
Nearest-Neighbor Methods in Learning and Vision.
IEEE Trans. Neural Networks, 2008

Reducing drift in differential tracking.
Comput. Vis. Image Underst., 2008

Unsupervised Learning of Visual Sense Models for Polysemous Words.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Photo-based question answering.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Multimodal question answering for mobile devices.
Proceedings of the 13th International Conference on Intelligent User Interfaces, 2008

Topologically-constrained latent variable models.
Proceedings of the Machine Learning, 2008

Scalable classifiers for Internet vision tasks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008

Dynamic visual category learning.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

Sparse probabilistic regression for activity-independent human pose inference.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

Autotagging Facebook: Social network context improves photo annotation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008

Transfer learning for image classification with sparse prototype representations.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

Unsupervised feature selection via distributed coding for multi-view object recognition.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

2007
Learning to Transform Time Series with a Few Examples.
IEEE Trans. Pattern Anal. Mach. Intell., 2007

Hidden Conditional Random Fields.
IEEE Trans. Pattern Anal. Mach. Intell., 2007

The Pyramid Match Kernel: Efficient Learning with Sets of Features.
J. Mach. Learn. Res., 2007

Combining object and feature dynamics in probabilistic tracking.
Comput. Vis. Image Underst., 2007

Head gestures for perceptual interfaces: The role of context in improving recognition.
Artif. Intell., 2007

Object Category Recognition Using Probabilistic Fusion of Speech and Image Classifiers.
Proceedings of the Machine Learning for Multimodal Interaction , 2007

Conditional Sequence Model for Context-Based Recognition of Gaze Aversion.
Proceedings of the Machine Learning for Multimodal Interaction , 2007

Discriminative Gaussian process latent variable model for classification.
Proceedings of the Machine Learning, 2007

Detecting communication errors from visual cues during the system's conversational turn.
Proceedings of the 9th International Conference on Multimodal Interfaces, 2007

Multimodal communication error detection for driver-car interaction.
Proceedings of the ICINCO 2007, 2007

Adaptive Vocabulary Forests br Dynamic Indexing and Category Learning.
Proceedings of the IEEE 11th International Conference on Computer Vision, 2007

Active Learning with Gaussian Processes for Object Categorization.
Proceedings of the IEEE 11th International Conference on Computer Vision, 2007

Learning Visual Representations using Images with Captions.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

Latent-Dynamic Discriminative Models for Continuous Gesture Recognition.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

Pyramid Match Hashing: Sub-Linear Time Indexing Over Partial Correspondences.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

2006
Learning a Precedence Effect-Like Weighting Function for the Generalized Cross-Correlation Framework.
IEEE Trans. Speech Audio Process., 2006

Non-parametric and light-field deformable models.
Comput. Vis. Image Underst., 2006

Approximate Correspondences in High Dimensions.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Head gesture recognition in intelligent interfaces: the role of context in improving recognition.
Proceedings of the 11th International Conference on Intelligent User Interfaces, 2006

Recognizing gaze aversion gestures in embodied conversational discourse.
Proceedings of the 8th International Conference on Multimodal Interfaces, 2006

Co-Adaptation of audio-visual speech and gesture classifiers.
Proceedings of the 8th International Conference on Multimodal Interfaces, 2006

Hidden Conditional Random Fields for Gesture Recognition.
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 2006

Conditional Random People: Tracking Humans with CRFs and Grid Filters.
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 2006

Unsupervised Learning of Categories from Sets of Partially Matching Image Features.
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 2006

The Role of Context in Head Gesture Recognition.
Proceedings of the Proceedings, 2006

2005
Untethered gesture acquisition and recognition for virtual world manipulation.
Virtual Real., 2005

Incorporating Object Tracking Feedback into Background Maintenance Framework.
Proceedings of the 7th IEEE Workshop on Applications of Computer Vision / IEEE Workshop on Motion and Video Computing (WACV/MOTION 2005), 2005

Doubleshot: an interactive user-aided segmentation tool.
Proceedings of the 10th International Conference on Intelligent User Interfaces, 2005

Contextual recognition of head gestures.
Proceedings of the 7th International Conference on Multimodal Interfaces, 2005

Visual Speech Recognition with Loosely Synchronized Feature Streams.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

Avoiding the "Streetlight Effect": Tracking by Exploring Likelihood Modes.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

Improving audio source localization by learning the precedence effect.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Production domain modeling of pronunciation for visual speech recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Learning Appearance Manifolds from Video.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

Efficient Image Matching with Distributions of Local Invariant Features.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

On Modelling Nonlinear Shape-and-Texture Appearance Manifolds.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

Face Recognition with Image Sets Using Manifold Density Divergence.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

A picture is worth a thousand keywords: image-based object search on a mobile platform.
Proceedings of the Extended Abstracts Proceedings of the 2005 Conference on Human Factors in Computing Systems, 2005

2004
Speaker association with signal-level audiovisual fusion.
IEEE Trans. Multim., 2004

Introduction.
Commun. ACM, 2004

Navigating in virtual environments using a vision-based interface.
Proceedings of the Third Nordic Conference on Human-Computer Interaction 2004, 2004

Conditional Random Fields for Object Recognition.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

IDeixis - Searching the Web with Mobile Images for Location-Based Information.
Proceedings of the Mobile Human-Computer Interaction, 2004

Articulatory features for robust visual speech recognition.
Proceedings of the 6th International Conference on Multimodal Interfaces, 2004

From conversational tooltips to grounded discourse: head poseTracking in interactive dialog systems.
Proceedings of the 6th International Conference on Multimodal Interfaces, 2004

Real-time audio-visual tracking for meeting analysis.
Proceedings of the 6th International Conference on Multimodal Interfaces, 2004

Multiple person and speaker activity tracking with a particle filter.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Combining Simple Models to Approximate Complex Dynamics.
Proceedings of the Statistical Methods in Video Processing, 2004

Tracking People with a Sparse Network of Bearing Sensors.
Proceedings of the Computer Vision, 2004

Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes.
Proceedings of the Statistical Methods in Video Processing, 2004

Light Field Appearance Manifolds.
Proceedings of the Computer Vision, 2004

Searching the Web with Mobile Images for Location Recognition.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June, 2004

Simultaneous Calibration and Tracking with a Network of Non-Overlapping Sensors.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June, 2004

Fast Contour Matching Using Approximate Earth Mover's Distance.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June, 2004

IDeixis: image-based Deixis for finding location-based information.
Proceedings of the Extended abstracts of the 2004 Conference on Human Factors in Computing Systems, 2004

Nodding in conversations with a robot.
Proceedings of the Extended abstracts of the 2004 Conference on Human Factors in Computing Systems, 2004

2003
Perceptive Presence.
IEEE Computer Graphics and Applications, 2003

A multi-modal approach for determining speaker location and focus.
Proceedings of the 5th International Conference on Multimodal Interfaces, 2003

Untethered gesture acquisition and recognition for a multimodal conversational system.
Proceedings of the 5th International Conference on Multimodal Interfaces, 2003

Learning cross-modal appearance models with application to tracking.
Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

Fast Pose Estimation with Parameter-Sensitive Hashing.
Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV 2003), 2003

Inferring 3D Structure with a Statistical Image-Based Shape Model.
Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV 2003), 2003

Constraining Human Body Tracking.
Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV 2003), 2003

Activity Zones for Context-Aware Computing.
Proceedings of the UbiComp 2003: Ubiquitous Computing, 2003

Gesture + Play Exploring Full-Body Navigation for Virtual Environments.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2003

Adaptive View-Based Appearance Models.
Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), 2003

A Bayesian Approach to Image-Based Visual Hull Reconstruction.
Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), 2003

A Probabilistic Framework for Multi-modal Multi-Person Tracking.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2003

Gesture + play: full-body interaction for virtual environments.
Proceedings of the Extended abstracts of the 2003 Conference on Human Factors in Computing Systems, 2003

Pose Estimation using 3D View-Based Eigenspaces.
Proceedings of the 2003 IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG 2003), 2003

2002
Range Segmentation Using Visibility Constraints.
Int. J. Comput. Vis., 2002

Using Multiple-Hypothesis Disparity Maps and Image Velocity for 3-D Motion Estimation.
Int. J. Comput. Vis., 2002

Activity maps for location-aware computing.
Proceedings of the 6th IEEE Workshop on Applications of Computer Vision (WACV 2002), 2002

Recovering Articulated Model Topology from Observed Rigid Motion.
Proceedings of the Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, 2002

Location Estimation with a Differential Update Network.
Proceedings of the Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, 2002

Bayesian network for online global pose estimation.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, September 30, 2002

Stereo Tracking Using ICP and Normal Flow Constraint.
Proceedings of the 16th International Conference on Pattern Recognition, 2002

Audiovisual Arrays for Untethered Spoken Interfaces.
Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (ICMI 2002), 2002

3-D Articulated Pose Tracking for Untethered Diectic Reference.
Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (ICMI 2002), 2002

Audio-video array source localization for intelligent environments.
Proceedings of the IEEE International Conference on Acoustics, 2002

Informative subspaces for audio-visual processing: High-level function from low-level fusion.
Proceedings of the IEEE International Conference on Acoustics, 2002

Face-Responsive Interfaces: From Direct Manipulation to Perceptive Presence.
Proceedings of the UbiComp 2002: Ubiquitous Computing, 4th International Conference, Göteborg, Sweden, September 29, 2002

On Probabilistic Combination of Face and Gait Cues for Identification.
Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2002), 2002

Fast Stereo-Based Head Tracking for Interactive Environments.
Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2002), 2002

Face Recognition from Long-Term Observations.
Proceedings of the Computer Vision, 2002

Probabalistic Models and Informative Subspaces for Audiovisual Correspondence.
Proceedings of the Computer Vision, 2002

Evaluating look-to-talk: a gaze-aware interface in a collaborative environment.
Proceedings of the Extended abstracts of the 2002 Conference on Human Factors in Computing Systems, 2002

Fast 3D Model Acquisition from Stereo Images.
Proceedings of the 1st International Symposium on 3D Data Processing Visualization and Transmission (3DPVT 2002), 2002

2001
Correspondence with Cumulative Similiarity Transforms.
IEEE Trans. Pattern Anal. Mach. Intell., 2001

Privacy in Context.
Hum. Comput. Interact., 2001

Audio-video array source separation for perceptual user interfaces.
Proceedings of the 2001 workshop on Perceptive user interfaces, 2001

Signal level fusion for multimodal perceptual user interface.
Proceedings of the 2001 workshop on Perceptive user interfaces, 2001

Reducing Drift in Parametric Motion Tracking.
Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7-14, 2001, 2001

Motion Estimation from Disparity Images.
Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7-14, 2001, 2001

Plan-View Trajectory Estimation with Dense Stereo Background Models.
Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7-14, 2001, 2001

Integrated Face and Gait Recognition From Multiple Views.
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), 2001

2000
Integrated Person Tracking Using Stereo, Color, and Pattern Detection.
Int. J. Comput. Vis., 2000

Learning Joint Statistical Models for Audio-Visual Fusion and Segregation.
Proceedings of the Advances in Neural Information Processing Systems 13, 2000

Ausio-visual Segmentation and "The Cocktail Party Effect".
Proceedings of the Advances in Multimodal Interfaces, 2000

Articulated-Pose Estimation Using Brightness and Depth-Constancy Constraints.
Proceedings of the 2000 Conference on Computer Vision and Pattern Recognition (CVPR 2000), 2000

1999
3D Pose Tracking with Linear Depth and Brightness Constraints.
Proceedings of the International Conference on Computer Vision, 1999

Background Estimation and Removal Based on Range and Color.
Proceedings of the 1999 Conference on Computer Vision and Pattern Recognition (CVPR '99), 1999

Dynamic Occluding Contours: A New External-Energy Term for Snakes.
Proceedings of the 1999 Conference on Computer Vision and Pattern Recognition (CVPR '99), 1999

1998
Mass hallucination.
Proceedings of the ACM SIGGRAPH 98 Conference Abstracts and Applications, 1998

Example-Based Image Synthesis of Articulated Figures.
Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

A Virtual Mirror Interface Using Real-Time Robust Face Tracking.
Proceedings of the 3rd International Conference on Face & Gesture Recognition (FG '98), 1998

Magic Morphin' Mirror: Person Detection and Tracking.
Proceedings of the 1998 Conference on Computer Vision and Pattern Recognition (CVPR '98), 1998

A Radial Cumulative Similarity Transform for Robust Image Correspondence.
Proceedings of the 1998 Conference on Computer Vision and Pattern Recognition (CVPR '98), 1998

1997
Pfinder: Real-Time Tracking of the Human Body.
IEEE Trans. Pattern Anal. Mach. Intell., 1997

The ALIVE System: Wireless, Full-Body Interaction with Autonomous Agents.
Multim. Syst., 1997

Perceptive Spaces for Performance and Entertainment Untethered Interaction Using Computer Vision and Audition.
Appl. Artif. Intell., 1997

Magic morphin mirror: face-sensitive distortion and exaggeration.
Proceedings of the ACM SIGGRAPH 97 Visual Proceedings: The art and interdisciplinary programs of SIGGRAPH '97, 1997

1996
Task-Specific Gesture Analysis in Real-Time Using Interpolated Views.
IEEE Trans. Pattern Anal. Mach. Intell., 1996

Active gesture recognition using partially observable Markov decision processes.
Proceedings of the 13th International Conference on Pattern Recognition, 1996

Active Face Tracking and Pose Estimation in an Interactive Room.
Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96), 1996

Modeling, Tracking and Interactive Animation of Faces and Heads Using Input from Video.
Proceedings of the Computer Animation 1996, 1996

1995
Cooperative Robust Estimation Using Layers of Support.
IEEE Trans. Pattern Anal. Mach. Intell., 1995

Active Gesture Recognition using Learned Visual Attention.
Proceedings of the Advances in Neural Information Processing Systems 8, 1995

Modeling Interactive Agents in ALIVE.
Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995

The ALIVE system: full-body interaction with autonomous agents.
Proceedings of the Computer Animation 1995, 1995

1994
Evolving Visual Routines.
Artif. Life, 1994

Correlation and Interpolation Networks for Real-time Expression Analysis/Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 7, 1994

Visual perception of human bodies and faces for multi-modal interfaces.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Visually guided animation.
Proceedings of the Computer Animation 1994, 1994

ALIVE: Artificial Life Interactive Video Environment.
Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA, USA, July 31, 1994

1993
Classifying Hand Gestures with a View-Based Distributed Representation.
Proceedings of the Advances in Neural Information Processing Systems 6, 1993

'Nulling' filters and the separation of transparent motions.
Proceedings of the Conference on Computer Vision and Pattern Recognition, 1993

Space-time gestures.
Proceedings of the Conference on Computer Vision and Pattern Recognition, 1993

1991
Against Edges: Function Approximation with Multiple Support Maps.
Proceedings of the Advances in Neural Information Processing Systems 4, 1991

On the representation of occluded shapes.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1991

1990
Depth from focus using a pyramid architecture.
Pattern Recognit. Lett., 1990

Segmentation by minimal description.
Proceedings of the Third International Conference on Computer Vision, 1990

1989
A simple, real-time range camera.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1989

1988
Pyramid based depth from focus.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1988

1987
PSFIG - A DITROFF Preprocessor for Postscript Figures.
Proceedings of the USENIX Summer Conference. Phoenix, AR, USA, June 1987, 1987


  Loading...