Ivan Laptev

Orcid: 0000-0001-7072-3325

According to our database1, Ivan Laptev authored at least 152 papers between 1998 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Image Compression with Product Quantized Masked Image Modeling.
Trans. Mach. Learn. Res., 2023

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos.
CoRR, 2023

Contact Models in Robotics: a Comparative Analysis.
CoRR, 2023

VidChapters-7M: Video Chapters at Scale.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Robust Visual Sim-to-Real Transfer for Robotic Manipulation.
IROS, 2023

Object Goal Navigation with Recursive Implicit Maps.
IROS, 2023

Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Learning Video-Conditioned Policies for Unseen Manipulation Tasks.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation.
Proceedings of the Conference on Robot Learning, 2023

Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Estimating 3D Motion and Forces of Human-Object Interactions from Internet Videos.
Int. J. Comput. Vis., 2022

Multi-Task Learning of Object State Changes from Uncurated Videos.
CoRR, 2022

Augmenting differentiable physics with randomized smoothing.
CoRR, 2022

Learning to Answer Visual Questions from Web Videos.
CoRR, 2022

Weakly-supervised segmentation of referring expressions.
CoRR, 2022

Leveraging Randomized Smoothing for Optimal Control of Nonsmooth Dynamical Systems.
CoRR, 2022

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation.
Proceedings of the Computer Vision - ECCV 2022, 2022

TubeDETR: Spatio-Temporal Video Grounding with Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Instruction-driven history-aware policies for robotic manipulations.
Proceedings of the Conference on Robot Learning, 2022

2021
Differentiable Simulation for Physical System Identification.
IEEE Robotics Autom. Lett., 2021

Synthetic Humans for Action Recognition from Unseen Viewpoints.
Int. J. Comput. Vis., 2021

Long term spatio-temporal modeling for action detection.
Comput. Vis. Image Underst., 2021

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
CoRR, 2021

Reconstructing and grounding narrated instructional videos in 3D.
CoRR, 2021

XCiT: Cross-Covariance Image Transformers.
CoRR, 2021

Training Vision Transformers for Image Retrieval.
CoRR, 2021

Differentiable rendering with perturbed optimizers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

History Aware Multimodal Transformer for Vision-and-Language Navigation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

XCiT: Cross-Covariance Image Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Goal-Conditioned Reinforcement Learning with Imagined Subgoals.
Proceedings of the 38th International Conference on Machine Learning, 2021

Just Ask: Learning to Answer Questions from Millions of Narrated Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Segmenter: Transformer for Semantic Segmentation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Airbert: In-domain Pretraining for Vision-and-Language Navigation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Towards Unconstrained Joint Hand-Object Reconstruction From RGB Videos.
Proceedings of the International Conference on 3D Vision, 2021

2020
Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning.
IEEE Robotics Autom. Lett., 2020

RareAct: A video dataset of unusual interactions.
CoRR, 2020

The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020).
CoRR, 2020

Occlusion resistant learning of intuitive physics from videos.
CoRR, 2020

Learning visual policies for building 3D shape categories.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

Learning to combine primitive skills: A step towards versatile robotic manipulation §.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Learning Actionness via Long-Range Temporal Order Verification.
Proceedings of the Computer Vision - ECCV 2020, 2020

End-to-End Learning of Visual Representations From Uncurated Instructional Videos.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Interactions and Relationships Between Movie Characters.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Leveraging Photometric Consistency Over Time for Sparsely Supervised Hand-Object Reconstruction.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Action Modifiers: Learning From Adverbs in Instructional Videos.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Obstacle Representations for Neural Motion Planning.
Proceedings of the 4th Conference on Robot Learning, 2020

Learning Object Manipulation Skills via Approximate State Estimation from Real Videos.
Proceedings of the 4th Conference on Robot Learning, 2020

2019
Combining learned skills and reinforcement learning for robotic manipulations.
CoRR, 2019

Learning to Augment Synthetic Images for Sim2Real Policy Transfer.
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

Margin based knowledge distillation for mobile face recognition.
Proceedings of the Twelfth International Conference on Machine Vision, 2019

Detecting Unseen Visual Relations Using Analogies.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Cross-Task Weakly Supervised Learning From Instructional Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Leveraging the Present to Anticipate the Future in Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Deep Metric Learning Beyond Binary Supervision.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Learning Joint Reconstruction of Hands and Manipulated Objects.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Long-Term Temporal Convolutions for Action Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Learning from Narrated Instruction Videos.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Detecting rare visual relations using analogies.
CoRR, 2018

Tube-CNN: Modeling temporal evolution of appearance for object detection in video.
CoRR, 2018

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions.
CoRR, 2018

Modeling Spatio-Temporal Human Track Structure for Action Localization.
CoRR, 2018

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data.
CoRR, 2018

A flexible model for training action localization with varying levels of supervision.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

BodyNet: Volumetric Inference of 3D Human Body Shapes.
Proceedings of the Computer Vision - ECCV 2018, 2018

MobileFace: 3D Face Reconstruction with Efficient CNN Regression.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

2017
The THUMOS challenge on action recognition for videos "in the wild".
Comput. Vis. Image Underst., 2017

Editorial- Deep Learning for Computer Vision.
Comput. Vis. Image Underst., 2017

Learnable pooling with Context Gating for video classification.
CoRR, 2017

Joint Discovery of Object States and Manipulating Actions.
CoRR, 2017

Weakly-Supervised Learning of Visual Relations.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning from Video and Text via Large-Scale Discriminative Clustering.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Joint Discovery of Object States and Manipulation Actions.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning from Synthetic Humans.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

The Analysis of High Density Crowds in Videos.
Proceedings of the Group and Crowd Behavior for Computer Vision, 1st Edition, 2017

2016
Guest Editorial: Video Recognition.
Int. J. Comput. Vis., 2016

Much Ado About Time: Exhaustive Annotation of Temporal Data.
Proceedings of the Fourth AAAI Conference on Human Computation and Crowdsourcing, 2016

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding.
Proceedings of the Computer Vision - ECCV 2016, 2016

ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization.
Proceedings of the Computer Vision - ECCV 2016, 2016

Instance-Level Video Segmentation from Object Tracks.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Thin-Slicing for Pose: Learning to Understand Pose without Explicit Pose Estimation.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Unsupervised Learning from Narrated Instruction Videos.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Pose Estimation and Segmentation of Multiple People in Stereoscopic Movies.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

Unsupervised object discovery and localization in images and videos.
Proceedings of the 12th International Conference on Ubiquitous Robots and Ambient Intelligence, 2015

Context-Aware CNNs for Person Head Detection.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Unsupervised Object Discovery and Tracking in Video Collections.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

P-CNN: Pose-Based CNN Features for Action Recognition.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Weakly-Supervised Alignment of Video with Text.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Is object localization for free? - Weakly-supervised learning with convolutional neural networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

On pairwise costs for network flow multi-object tracking.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
People Watching: Human Actions as a Cue for Single View Geometry.
Int. J. Comput. Vis., 2014

On Pairwise Cost for Multi-Object Network Flow Tracking.
CoRR, 2014

Predicting Actions from Static Scenes.
Proceedings of the Computer Vision - ECCV 2014, 2014

Weakly Supervised Action Labeling in Videos under Ordering Constraints.
Proceedings of the Computer Vision - ECCV 2014, 2014

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Efficient Feature Extraction, Encoding, and Classification for Action Recognition.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013
Finding Actors and Actions in Movies.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Pose Estimation and Segmentation of People in 3D Movies.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Modeling and visual recognition of human actions and interactions.
, 2013

2012
Actlets: A novel local representation for human action recognition in video.
Proceedings of the 19th IEEE International Conference on Image Processing, 2012

Scene Semantics from Long-Term Observation of People.
Proceedings of the Computer Vision - ECCV 2012, 2012

Object Detection Using Strongly-Supervised Deformable Part Models.
Proceedings of the Computer Vision - ECCV 2012, 2012

2011
View-Independent Action Recognition from Temporal Self-Similarities.
IEEE Trans. Pattern Anal. Mach. Intell., 2011

Learning person-object interactions for action recognition in still images.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Joint pose estimation and action recognition in image graphs.
Proceedings of the 18th IEEE International Conference on Image Processing, 2011

Data-driven crowd analysis in videos.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Density-aware person detection and tracking in crowds.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Track to the future: Spatio-temporal video segmentation with long-range motion cues.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

2010
INRIA-WILLOW at TRECVID 2010 : Surveillance Event Detection.
Proceedings of the TRECVID 2010 workshop participants notebook papers, 2010

Recognizing Human Action in the Wild.
Proceedings of the Human Behavior Understanding, First International Workshop, 2010

Semi-supervised Learning of Facial Attributes in Video.
Proceedings of the Trends and Topics in Computer Vision, 2010

Improving bag-of-features action recognition with non-local cues.
Proceedings of the British Machine Vision Conference, 2010

Recognizing human actions in still images: a study of bag-of-features and part-based representations.
Proceedings of the British Machine Vision Conference, 2010

2009
Improving object detection with boosted histograms.
Image Vis. Comput., 2009

View-independent Video Synchronization from Temporal Self-similarities.
Proceedings of the VISAPP 2009 - Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, Lisboa, Portugal, February 5-8, 2009, 2009

Automatic annotation of human actions in video.
Proceedings of the IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27, 2009

Modeling Image Context Using Object Centered Grid.
Proceedings of the DICTA 2009, 2009

Actions in context.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Evaluation of Local Spatio-temporal Features for Action Recognition.
Proceedings of the British Machine Vision Conference, 2009

Multi-view Synchronization of Human Actions and Dynamic Scenes.
Proceedings of the British Machine Vision Conference, 2009

2008
Cross-View Action Recognition from Temporal Self-similarities.
Proceedings of the Computer Vision, 2008

Learning realistic human actions from movies.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

2007
Local velocity-adapted motion events for spatio-temporal recognition.
Comput. Vis. Image Underst., 2007

Retrieving actions in movies.
Proceedings of the IEEE 11th International Conference on Computer Vision, 2007

Video copy detection: a comparative study.
Proceedings of the 6th ACM International Conference on Image and Video Retrieval, 2007

2006
Improvements of Object Detection Using Boosted Histograms.
Proceedings of the British Machine Vision Conference 2006, 2006

2005
On Space-Time Interest Points.
Int. J. Comput. Vis., 2005

Periodic Motion Detection and Segmentation via Approximate Sequence Alignment.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

2004
Local spatio-temporal image features for motion interpretation.
PhD thesis, 2004

Velocity adaptation of spatio-temporal receptive fields for direct recognition of activities: an experimental study.
Image Vis. Comput., 2004

Local Descriptors for Spatio-temporal Recognition.
Proceedings of the Spatial Coherence for Visual Motion Analysis, 2004

Recognizing Human Actions: A Local SVM Approach.
Proceedings of the 17th International Conference on Pattern Recognition, 2004

Galilean-Diagonalized Spatio-Temporal Interest Operators.
Proceedings of the 17th International Conference on Pattern Recognition, 2004

Velocity Adaptation of Space-Time Interest Points.
Proceedings of the 17th International Conference on Pattern Recognition, 2004

2003
A Distance Measure and a Feature Likelihood Map Concept for Scale-Invariant Model Matching.
Int. J. Comput. Vis., 2003

Interest Point Detection and Scale Selection in Space-Time.
Proceedings of the Scale Space Methods in Computer Vision, 4th International Conference, 2003

Space-time Interest Points.
Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV 2003), 2003

2002
Hand Gesture Recognition using Multi-Scale Colour Features, Hierarchical Models and Particle Filtering.
Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2002), 2002

2001
A Multi-scale Feature Likelihood Map for Direct Evaluation of Object Hypotheses.
Proceedings of the Scale-Space and Morphology in Computer Vision, 2001

Tracking of Multi-state Hand Models Using Particle Filtering and a Hierarchy of Multi-scale Image Features.
Proceedings of the Scale-Space and Morphology in Computer Vision, 2001

2000
Automatic extraction of roads from aerial images based on scale space and snakes.
Mach. Vis. Appl., 2000

1998
Agilo RoboCuppers: RoboCup Team Description.
Proceedings of the RoboCup-98: Robot Soccer World Cup II, 1998

Multi-scale and Snakes for Automatic Road Extraction.
Proceedings of the Computer Vision, 1998


  Loading...