Limin Wang

Orcid: 0000-0002-3674-7718

Affiliations:
  • Nanjing University, State Key Laboratory for Novel Software Technology, China
  • ETH Zurich, Computer Vision Laboratory, Switzerland (former)
  • Chinese University of Hong Kong, Department of Information Engineeing, China (former)
  • Chinese Academy of Sciences, Shenzhen Institutes of Advanced Technology, China (former)


According to our database1, Limin Wang authored at least 166 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Learning Optical Flow and Scene Flow With Bidirectional Camera-LiDAR Fusion.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

Dual Masked Modeling for Weakly-Supervised Temporal Boundary Discovery.
IEEE Trans. Multim., 2024

Sparse Action Tube Detection.
IEEE Trans. Image Process., 2024

Dual Graph Networks for Pose Estimation in Crowded Scenes.
Int. J. Comput. Vis., 2024

Multiple Object Tracking as ID Prediction.
CoRR, 2024

VideoMamba: State Space Model for Efficient Video Understanding.
CoRR, 2024

Spatiotemporal Predictive Pre-training for Robotic Motor Control.
CoRR, 2024

StableDrag: Stable Dragging for Point-based Image Editing.
CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
CoRR, 2024

2023
Recovering 3D Human Mesh From Monocular Images: A Survey.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Webly-supervised semantic segmentation via curriculum learning.
Comput. Vis. Image Underst., November, 2023

Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

BasicTAD: An astounding RGB-Only baseline for temporal action detection.
Comput. Vis. Image Underst., July, 2023

APP-Net: Auxiliary-Point-Based Push and Pull Operations for Efficient Point Cloud Recognition.
IEEE Trans. Image Process., 2023

LIP: Local Importance-Based Pooling.
Int. J. Comput. Vis., 2023

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding.
CoRR, 2023

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models.
CoRR, 2023

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos.
CoRR, 2023

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering.
CoRR, 2023

VBench: Comprehensive Benchmark Suite for Video Generative Models.
CoRR, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
CoRR, 2023

Asymmetric Masked Distillation for Pre-Training Small Foundation Models.
CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.
CoRR, 2023

Bridging The Gaps Between Token Pruning and Full Pre-training via Masked Fine-tuning.
CoRR, 2023

Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation.
CoRR, 2023

MGMAE: Motion Guided Masking for Video Masked Autoencoding.
CoRR, 2023

DPL: Decoupled Prompt Learning for Vision-Language Models.
CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
CoRR, 2023

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots.
CoRR, 2023

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation.
CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.
CoRR, 2023

VideoChat: Chat-Centric Video Understanding.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

Progressive Visual Prompt Learning with Contrastive Feature Re-formation.
CoRR, 2023

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens.
CoRR, 2023

CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection.
CoRR, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.
CoRR, 2023

Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion.
CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MixFormerV2: Efficient Fully Transformer Tracking.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning Discriminative Feature Representation for Open Set Action Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

RefineTAD: Learning Proposal-free Refinement for Temporal Action Detection.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Filter-Recovery Network for Multi-Speaker Audio-Visual Speech Separation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Deep Equilibrium Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Memory-and-Anticipation Transformer for Online Action Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

StageInteractor: Query-based Object Detector with Cross-stage Interaction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MGMAE: Motion Guided Masking for Video Masked Autoencoding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Efficient Video Action Detection with Token Dropout and Context Refinement.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Graph Routes From Local and Global Entrances.
Proceedings of the 6th International Conference on Big Data Technologies, 2023

Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

STMixer: A One-Stage Sparse Action Detector.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PDPP: Projected Diffusion for Procedure Planning in Instructional Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LinK: Linear Kernel for LiDAR-based 3D Perception.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
FineAction: A Fine-Grained Video Dataset for Temporal Action Localization.
IEEE Trans. Image Process., 2022

Cross-Domain Gated Learning for Domain Generalization.
Int. J. Comput. Vis., 2022

Fully convolutional online tracking.
Comput. Vis. Image Underst., 2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.
CoRR, 2022

VLG: General Video Recognition with Web Textual Knowledge.
CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.
CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.
CoRR, 2022

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022.
CoRR, 2022

Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach.
CoRR, 2022

APP-Net: Auxiliary-point-based Push and Pull Operations for Efficient Point Cloud Classification.
CoRR, 2022

Logit Normalization for Long-tail Object Detection.
CoRR, 2022

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

SpotFormer: A Transformer-based Framework for Precise Soccer Action Spotting.
Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022

The Tenth Visual Object Tracking VOT2022 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing.
Proceedings of the Computer Vision - ECCV 2022, 2022

Task-specific Inconsistency Alignment for Domain Adaptive Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Structured Sparse R-CNN for Direct Scene Graph Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

OCSampler: Compressing Videos to One Clip with Single-step Sampling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AdaMixer: A Fast-Converging Query-Based Object Detector.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MixFormer: End-to-End Tracking with Iterative Mixed Attention.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross-Architecture Self-supervised Video Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DCAN: Improving Temporal Action Detection via Dual Context Aggregation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction.
IEEE Trans. Image Process., 2021

Cross-Modal Pyramid Translation for RGB-D Scene Recognition.
Int. J. Comput. Vis., 2021

End-to-End Dense Video Grounding via Parallel Regression.
CoRR, 2021

FineAction: A Fined Video Dataset for Temporal Action Localization.
CoRR, 2021

Target Transformed Regression for Accurate Tracking.
CoRR, 2021

3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop.
CoRR, 2021

NJU MCG - Sensetime Team Submission to Pre-training for Video Understanding Challenge Track II.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Cross-modal Pretraining and Matching for Video Understanding.
Proceedings of the MMPT@ICMR2021: Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding, 2021

The Ninth Visual Object Tracking VOT2021 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

MGSampler: An Explainable Sampling Strategy for Video Action Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Target Adaptive Context Aggregation for Video Scene Graph Generation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Relaxed Transformer Decoders for Direct Action Proposal Generation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

TAM: Temporal Adaptive Module for Video Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Self Supervision to Distillation for Long-Tailed Visual Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Mutual Supervision for Dense Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CGA-Net: Category Guided Aggregation for Point Cloud Semantic Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

TDN: Temporal Difference Networks for Efficient Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Dynamic Sampling Networks for Efficient Action Recognition in Videos.
IEEE Trans. Image Process., 2020

Temporal Action Detection with Structured Segment Networks.
Int. J. Comput. Vis., 2020

Learning Spatiotemporal Features via Video and Text Pair Discrimination.
CoRR, 2020

V4D: 4D Convolutional Neural Networks for Video-level Representation Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Context-Aware RCNN: A Baseline for Action Detection in Videos.
Proceedings of the Computer Vision - ECCV 2020, 2020

Boundary-Aware Cascade Networks for Temporal Action Segmentation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Actions as Moving Points.
Proceedings of the Computer Vision - ECCV 2020, 2020

TEA: Temporal Excitation and Aggregation for Action Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

SketchyCOCO: Image Generation From Freehand Scene Sketches.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Knowledge Integration Networks for Action Recognition.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

TEINet: Towards an Efficient Architecture for Video Recognition.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Finding Action Tubes with a Sparse-to-Dense Framework.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Temporal Segment Networks for Action Recognition in Videos.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Dynamically Visual Disambiguation of Keyword-based Image Search.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Learning Actor Relation Graphs for Group Activity Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Cross-Stream Selective Networks for Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Translate-to-Recognize Networks for RGB-D Scene Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Real-Time Action Recognition With Deeply Transferred Motion Vector CNNs.
IEEE Trans. Image Process., 2018

Transferring Deep Object and Scene Representations for Event Recognition in Still Images.
Int. J. Comput. Vis., 2018

Structured Triplet Learning with POS-Tag Guided Attention for Visual Question Answering.
Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

Single Image Highlight Removal with a Sparse and Low-Rank Reflection Model.
Proceedings of the Computer Vision - ECCV 2018, 2018

Appearance-and-Relation Networks for Video Classification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition.
IEEE Trans. Image Process., 2017

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs.
IEEE Trans. Image Process., 2017

Locally Supervised Deep Hybrid Model for Scene Recognition.
IEEE Trans. Image Process., 2017

WebVision Database: Visual Learning and Understanding from Web Data.
CoRR, 2017

A Pursuit of Temporal Accuracy in General Activity Detection.
CoRR, 2017

WebVision Challenge: Visual Learning and Understanding With Web Data.
CoRR, 2017

UntrimmedNets for Weakly Supervised Action Recognition and Detection.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
MoFAP: A Multi-level Representation for Action Recognition.
Int. J. Comput. Vis., 2016

Modeling spatial layout for scene image understanding via a novel multiscale sum-product network.
Expert Syst. Appl., 2016

Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice.
Comput. Vis. Image Underst., 2016

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016.
CoRR, 2016

Transferring Object-Scene Convolutional Neural Networks for Event Recognition in Still Images.
CoRR, 2016

Codebook enhancement of vlad representation for visual recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition.
Proceedings of the Computer Vision - ECCV 2016, 2016

Real-Time Action Recognition with Enhanced Motion Vector CNNs.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Actionness Estimation Using Hybrid Fully Convolutional Networks.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Two-Stream SR-CNNs for Action Recognition in Videos.
Proceedings of the British Machine Vision Conference 2016, 2016

2015
Towards Good Practices for Very Deep Two-Stream ConvNets.
CoRR, 2015

Object-Scene Convolutional Neural Networks for Event Recognition in Images.
CoRR, 2015

Places205-VGGNet Models for Scene Recognition.
CoRR, 2015

Better Exploiting OS-CNNs for Better Event Recognition in Images.
Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop, 2015

Object-Scene Convolutional Neural Networks for event recognition in images.
Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015

Exploring Fisher vector and deep networks for action spotting.
Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015

Action recognition with trajectory-pooled deep-convolutional descriptors.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Latent Hierarchical Model of Temporal Structure for Complex Activity Classification.
IEEE Trans. Image Process., 2014

A Joint Evaluation of Dictionary Learning and Feature Encoding for Action Recognition.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Video Action Detection with Relational Dynamic-Poselets.
Proceedings of the Computer Vision - ECCV 2014, 2014

Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics.
Proceedings of the Computer Vision - ECCV 2014, 2014

Action and Gesture Temporal Spotting with Super Vector Representation.
Proceedings of the Computer Vision - ECCV 2014 Workshops, 2014

Multi-view Super Vector for Action Recognition.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013
Mining Motion Atoms and Phrases for Complex Action Recognition.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Motionlets: Mid-level 3D Parts for Human Motion Recognition.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012
A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition.
Proceedings of the Computer Vision - ACCV 2012, 2012

2011
Multiclass object detection by combining local appearances and context.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

2010
A Novel Approach for Robust Surveillance Video Content Abstraction.
Proceedings of the Advances in Multimedia Information Processing - PCM 2010, 2010


  Loading...