Zheng Shou

IEEE Trans. Pattern Anal. Mach. Intell., April, 2026

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond.

[BibT_eX]

[DOI]

CoRR, April, 2026

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents.

[BibT_eX]

[DOI]

CoRR, April, 2026

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models.

[BibT_eX]

[DOI]

CoRR, April, 2026

UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining.

[BibT_eX]

[DOI]

CoRR, April, 2026

P-Flow: Prompting Visual Effects Generation.

[BibT_eX]

[DOI]

Rui Zhao

CoRR, March, 2026

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance.

[BibT_eX]

[DOI]

CoRR, March, 2026

Semantic-Contact Fields for Category-Level Generalizable Tactile Tool Manipulation.

[BibT_eX]

[DOI]

CoRR, February, 2026

Olaf-World: Orienting Latent Actions for Video World Modeling.

[BibT_eX]

[DOI]

CoRR, February, 2026

World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy.

[BibT_eX]

[DOI]

CoRR, February, 2026

ShowUI-Aloha: Human-Taught GUI Agent.

[BibT_eX]

[DOI]

CoRR, January, 2026

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection.

[BibT_eX]

[DOI]

CoRR, January, 2026

A Survey on Foundations and Frontiers of Multimodal Agentic Frameworks: Techniques and Applications.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

Open-world Weakly-Supervised Object Localization.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

ShowUI-<i>π</i>: Flow-based Generative Models as GUI Dexterous Hands.

[BibT_eX]

[DOI]

Siyuan Hu

CoRR, December, 2025

Mitty: Diffusion-based Human-to-Robot Video Generation.

[BibT_eX]

[DOI]

CoRR, December, 2025

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models.

[BibT_eX]

[DOI]

Zechen Bai

Chen Gao

Santhosh Kumar Ramakrishnan

CoRR, December, 2025

H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos.

[BibT_eX]

[DOI]

CoRR, December, 2025

OmniPSD: Layered PSD Generation with Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, December, 2025

X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale.

[BibT_eX]

[DOI]

CoRR, December, 2025

Ego4D: Around the World in 3,600 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video.

[BibT_eX]

[DOI]

CoRR, November, 2025

WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation.

[BibT_eX]

[DOI]

CoRR, November, 2025

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment.

[BibT_eX]

[DOI]

CoRR, November, 2025

The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation.

[BibT_eX]

[DOI]

CoRR, November, 2025

DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection.

[BibT_eX]

[DOI]

CoRR, November, 2025

Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers.

[BibT_eX]

[DOI]

Yiqing Shi

CoRR, November, 2025

Computer-Use Agents as Judges for Generative User Interface.

[BibT_eX]

[DOI]

CoRR, November, 2025

AUTO-Explorer: Automated Data Collection for GUI Agent.

[BibT_eX]

[DOI]

Xiangwu Guo

Difei Gao

CoRR, November, 2025

Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents.

[BibT_eX]

[DOI]

CoRR, October, 2025

Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning.

[BibT_eX]

[DOI]

CoRR, October, 2025

Paper2Video: Automatic Video Generation from Scientific Papers.

[BibT_eX]

[DOI]

Zeyu Zhu

CoRR, October, 2025

Code2Video: A Code-centric Paradigm for Educational Video Generation.

[BibT_eX]

[DOI]

Yanzhe Chen

CoRR, October, 2025

PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer.

[BibT_eX]

[DOI]

Zhiwei Yang

Chen Gao

CoRR, September, 2025

Personalized Vision via Visual In-Context Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

CoFFT: Chain of Foresight-Focus Thought for Visual Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing.

[BibT_eX]

[DOI]

CoRR, September, 2025

CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., August, 2025

Paragraph-to-Image Generation with Information-Enriched Diffusion Model.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., August, 2025

Ego-centric Predictive Model Conditioned on Hand Trajectories.

[BibT_eX]

[DOI]

Binjie Zhang

CoRR, August, 2025

Reinforcement Learning in Vision: A Survey.

[BibT_eX]

[DOI]

CoRR, August, 2025

Multi-human Interactive Talking Dataset.

[BibT_eX]

[DOI]

Zeyu Zhu

Weijia Wu

CoRR, August, 2025

VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback.

[BibT_eX]

[DOI]

CoRR, July, 2025

MoonShot: Towards Controllable Video Generation and Editing with Motion-Aware Multimodal Conditions.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., June, 2025

FramePrompt: In-context Controllable Animation with Zero Structural Changes.

[BibT_eX]

[DOI]

Guian Fang

Yuchao Gu

CoRR, June, 2025

Show-o2: Improved Native Unified Multimodal Models.

[BibT_eX]

[DOI]

Jinheng Xie

Zhenheng Yang

CoRR, June, 2025

D-AR: Diffusion via Autoregressive Models.

[BibT_eX]

[DOI]

Ziteng Gao

CoRR, May, 2025

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning.

[BibT_eX]

[DOI]

Zhenheng Yang

Konstantinos N. Plataniotis

CoRR, May, 2025

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

DD-Ranking: Rethinking the Evaluation of Dataset Distillation.

[BibT_eX]

[DOI]

Baharan Mirzasoleiman

Manolis Kellis

CoRR, May, 2025

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2025

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model.

[BibT_eX]

[DOI]

CoRR, April, 2025

AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis.

[BibT_eX]

[DOI]

CoRR, March, 2025

Long-Context Autoregressive Video Modeling with Next-Frame Prediction.

[BibT_eX]

[DOI]

Yuchao Gu

CoRR, March, 2025

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Edit Transfer: Learning Image Editing via Vision In-Context Relations.

[BibT_eX]

[DOI]

CoRR, March, 2025

TPDiff: Temporal Pyramid Video Diffusion Model.

[BibT_eX]

[DOI]

Lingmin Ran

CoRR, March, 2025

In-Context Defense in Computer Agents: An Empirical Study.

[BibT_eX]

[DOI]

Pei Yang

Hai Ci

CoRR, March, 2025

Automated Movie Generation via Multi-Agent CoT Planning.

[BibT_eX]

[DOI]

Weijia Wu

Zeyu Zhu

CoRR, March, 2025

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback.

[BibT_eX]

[DOI]

CoRR, February, 2025

PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data.

[BibT_eX]

[DOI]

CoRR, February, 2025

WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation.

[BibT_eX]

[DOI]

Henry Hengyuan Zhao

Difei Gao

CoRR, February, 2025

UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths.

[BibT_eX]

[DOI]

Zhenheng Yang

CoRR, February, 2025

MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation.

[BibT_eX]

[DOI]

Cheng Liu

CoRR, February, 2025

A Bilingual, Open World Video Text Dataset and Real-Time Video Text Spotting With Contrastive Learning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., January, 2025

Faster Diffusion Through Temporal Attention Decomposition.

[BibT_eX]

[DOI]

Juan-Manuel Pérez-Rúa

Jürgen Schmidhuber

Trans. Mach. Learn. Res., 2025

A large cross-modal video retrieval dataset with reading comprehension.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

ColonNeRF: High-fidelity neural reconstruction of long colonoscopy.

[BibT_eX]

[DOI]

Neurocomputing, 2025

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents.

[BibT_eX]

[DOI]

Pei Yang

Hai Ci

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data.

[BibT_eX]

[DOI]

Cheng Liu

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

DOTA: Distributional Test-time Adaptation of Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Sparse Image Synthesis via Joint Latent and RoI Flow.

[BibT_eX]

[DOI]

Ziteng Gao

Jay Zhangjie Wu

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

GUI-Narrator: Detecting and Captioning Computer GUI Actions.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Can I Trust You? Advancing GUI Task Automation with Action Trust Score.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

WMAdapter: Adding WaterMark Control to Latent Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Impossible Videos.

[BibT_eX]

[DOI]

Zechen Bai

Hai Ci

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Image Watermarks are Removable using Controllable Regeneration from Clean Noise.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Grounding Multimodal Large Language Model in GUI World.

[BibT_eX]

[DOI]

Weixian Lei

Difei Gao

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Factorized Learning for Temporally Grounded Video-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity.

[BibT_eX]

[DOI]

Xiaokang Liu

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer.

[BibT_eX]

[DOI]

Danze Chen

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Balanced Image Stylization with Style Matching Score.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models with Human Feedback.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost.

[BibT_eX]

[DOI]

Haiyang Mei

Pengyu Zhang

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ShowUI: One Vision-Language-Action Model for GUI Visual Agent.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ROICtrl: Boosting Instance Control for Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles.

[BibT_eX]

[DOI]

Rui Zhao

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting.

[BibT_eX]

[DOI]

Muhammet Furkan Ilaslan

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Managing Metaverse Data Tsunami: Actionable Insights.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., December, 2024

Continual Learning for Image Segmentation With Dynamic Query.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., June, 2024

Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided Text Prompts.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2024

VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting.

[BibT_eX]

[DOI]

Muhammet Furkan Ilaslan

CoRR, 2024

Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation.

[BibT_eX]

[DOI]

CoRR, 2024

ShowUI: One Vision-Language-Action Model for GUI Visual Agent.

[BibT_eX]

[DOI]

CoRR, 2024

Factorized Visual Tokenization and Generation.

[BibT_eX]

[DOI]

CoRR, 2024

FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data.

[BibT_eX]

[DOI]

CoRR, 2024

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use.

[BibT_eX]

[DOI]

CoRR, 2024

ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model.

[BibT_eX]

[DOI]

CoRR, 2024

Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos.

[BibT_eX]

[DOI]

CoRR, 2024

High Quality Human Image Animation using Regional Supervision and Motion Blur Condition.

[BibT_eX]

[DOI]

CoRR, 2024

DOTA: Distributional Test-Time Adaptation of Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval.

[BibT_eX]

[DOI]

CoRR, 2024

GUI Action Narrator: Where and When Did That Action Take Place?

[BibT_eX]

[DOI]

CoRR, 2024

Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?

[BibT_eX]

[DOI]

CoRR, 2024

ProcessPainter: Learn Painting Process from Sequence Data.

[BibT_eX]

[DOI]

CoRR, 2024

Multi-Modal Generative Embedding Model.

[BibT_eX]

[DOI]

CoRR, 2024

LOVA3: Learning to Visual Question Answering, Asking and Assessment.

[BibT_eX]

[DOI]

CoRR, 2024

Hallucination of Multimodal Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

Learning Long-form Video Prior via Generative Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2024

Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters.

[BibT_eX]

[DOI]

CoRR, 2024

Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Towards A Better Metric for Text-to-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions.

[BibT_eX]

[DOI]

CoRR, 2024

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2024

ProcessPainter: Learning to draw from sequence data.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2024 Conference Papers, 2024

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

LOVA3: Learning to Visual Question Answering, Asking and Assessment.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Skinned Motion Retargeting with Dense Geometric Interaction Perception.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Can Simple Averaging Defeat Modern Watermarks?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DoFIT: Domain-aware Federated Instruction Tuning with Alleviated Catastrophic Forgetting.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Visual Perception by Large Language Model's Weights.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Exocentric-to-Egocentric Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

VideoGUI: A Benchmark for GUI Automation from Instructional Videos.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Spiking-Leaf: A Learnable Auditory Front-End for Spiking Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

AssistGPT: Towards Multi-modal Agent for Human-Centric AI Assistant.

[BibT_eX]

[DOI]

Proceedings of the 5th International Workshop on Human-centric Multimedia Analysis, 2024

GENIXER: Empowering Multimodal Large Language Model as a Powerful Data Generator.

[BibT_eX]

[DOI]

Henry Hengyuan Zhao

Pan Zhou

Proceedings of the Computer Vision - ECCV 2024, 2024

MotionDirector: Motion Customization of Text-to-Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

DragAnything: Motion Control for Anything Using Entity Representation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Learning Video Context as Interleaved Multimodal Sequences.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Parrot Captions Teach CLIP to Spot Text.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

RingID: Rethinking Tree-Ring Watermarking for Enhanced Multi-key Identification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Tune-an-Ellipse: CLIP Has Potential to Find what you Want.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

X- Adapter: Universal Compatibility of Plugins for Upgraded Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VIT-LENS: Towards Omni-modal Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Bootstrapping SparseFormers from Vision Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AssistGUI: Task-Oriented PC Graphical User Interface Automation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Magi-Net: Meta Negative Network for Early Activity Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors.

[BibT_eX]

[DOI]

CoRR, 2023

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation.

[BibT_eX]

[DOI]

CoRR, 2023

Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator.

[BibT_eX]

[DOI]

Henry Hengyuan Zhao

Pan Zhou

Bardienus Pieter Duisterhof

CoRR, 2023

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2023

ColonNeRF: Neural Radiance Fields for High-Fidelity Long-Sequence Colonoscopy Reconstruction.

[BibT_eX]

[DOI]

CoRR, 2023

MD-Splatting: Learning Metric Deformation from 4D Gaussians in Highly Deformable Scenes.

[BibT_eX]

[DOI]

CoRR, 2023

MLLMs-Augmented Visual-Language Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2023

ViT-Lens-2: Gateway to Omni-modal Intelligence.

[BibT_eX]

[DOI]

CoRR, 2023

Paragraph-to-Image Generation with Information-Enriched Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2023

CVPR 2023 Text Guided Video Editing Competition.

[BibT_eX]

[DOI]

CoRR, 2023

Integrating View Conditions for Image Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing.

[BibT_eX]

[DOI]

CoRR, 2023

MotionDirector: Motion Customization of Text-to-Video Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Bridging Sensor Gaps via Single-Direction Tuning for Hyperspectral Image Classification.

[BibT_eX]

[DOI]

CoRR, 2023

Dataset Condensation via Generative Model.

[BibT_eX]

[DOI]

CoRR, 2023

ViT-Lens: Towards Omni-modal Representations.

[BibT_eX]

[DOI]

CoRR, 2023

Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces.

[BibT_eX]

[DOI]

CoRR, 2023

Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks.

[BibT_eX]

[DOI]

CoRR, 2023

GroundNLQ @ Ego4D Natural Language Queries Challenge 2023.

[BibT_eX]

[DOI]

CoRR, 2023

TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter.

[BibT_eX]

[DOI]

CoRR, 2023

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn.

[BibT_eX]

[DOI]

CoRR, 2023

VisorGPT: Learning Visual Prior via Generative Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2023

Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection.

[BibT_eX]

[DOI]

CoRR, 2023

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video.

[BibT_eX]

[DOI]

CoRR, 2023

Open-World Weakly-Supervised Object Localization.

[BibT_eX]

[DOI]

CoRR, 2023

ICDAR 2023 Video Text Reading Competition for Dense and Small Text.

[BibT_eX]

[DOI]

CoRR, 2023

Attack is Good Augmentation: Towards Skeleton-Contrastive Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm.

[BibT_eX]

[DOI]

CoRR, 2023

DeepfakeMAE: Facial Part Consistency Aware Masked Autoencoder for Deepfake Video Detection.

[BibT_eX]

[DOI]

CoRR, 2023

STPrivacy: Spatio-Temporal Tubelet Sparsification and Anonymization for Privacy-preserving Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

XAGen: 3D Expressive Human Avatars Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning Visual Prior via Generative Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Object-centric Learning with Cyclic Walks between Parts and Whole.

[BibT_eX]

[DOI]

Ziyu Wang

Mengmi Zhang

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Large Generative Models Meet Multimodal Video Intelligence.

[BibT_eX]

[DOI]

Carl-Johann Simon-Gabriel

Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal Applications, 2023

PV3D: A 3D Generative Model for Portrait Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

The Metaverse Data Deluge: What Can We Do About It?

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

ICDAR 2023 Competition on Video Text Reading for Dense and Small Text.

[BibT_eX]

[DOI]

Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Label-Efficient Online Continual Object Detection in Streaming Video.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Too Large; Data Reduction for Vision-Language Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning to Learn: How to Continuously Teach Humans and Machines.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniVTG: Towards Unified Video-Language Temporal Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unsupervised Open-Vocabulary Object Localization in Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Revisiting Vision Transformer from the View of Path Ensemble.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations.

[BibT_eX]

[DOI]

Muhammet Furkan Ilaslan

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Position-Guided Text Prompt for Vision-Language Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

All in One: Exploring Unified Video-Language Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Affordance Grounding from Demonstration Video to Target Image.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DOAD: Decoupled One Stage Action Detection Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Making Vision Transformers Efficient from A Token Sparsification View.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Darwinian Model Upgrades: Model Evolving with Selective Compatibility.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Video-Text Pre-training with Learned Regions for Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Deep Motion Prior for Weakly-Supervised Temporal Action Localization.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2022

Position-guided Text Prompt for Vision-Language Pre-training.

[BibT_eX]

[DOI]

CoRR, 2022

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

Learning to Learn: How to Continuously Teach Humans and Machines.

[BibT_eX]

[DOI]

CoRR, 2022

An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022.

[BibT_eX]

[DOI]

CoRR, 2022

Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization.

[BibT_eX]

[DOI]

CoRR, 2022

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022.

[BibT_eX]

[DOI]

CoRR, 2022

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.

[BibT_eX]

[DOI]

CoRR, 2022

Sense The Physical, Walkthrough The Virtual, Manage The Metaverse: A Data-centric Perspective.

[BibT_eX]

[DOI]

CoRR, 2022

Egocentric Video-Language Pretraining.

[BibT_eX]

[DOI]

CoRR, 2022

Novel View Synthesis for High-fidelity Headshot Scenes.

[BibT_eX]

[DOI]

CoRR, 2022

GEB+: A benchmark for generic event boundary captioning, grounding and text-based retrieval.

[BibT_eX]

[DOI]

CoRR, 2022

Revitalize Region Feature for Democratizing Video-Language Pre-training.

[BibT_eX]

[DOI]

CoRR, 2022

All in One: Exploring Unified Video-Language Pre-training.

[BibT_eX]

[DOI]

CoRR, 2022

DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Egocentric Video-Language Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AVA-AVD: Audio-visual Speaker Diarization in the Wild.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

From Token to Word: OCR Token Evolution via Contrastive Learning and Semantic Matching for Text-VQA.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation.

[BibT_eX]

[DOI]

Proceedings of the HCMA@MM 2022: Proceedings of the 3rd International Workshop on Human-Centric Multimedia Analysis, 2022

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Object-aware Video-language Pre-training for Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unified Transformer Tracker for Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Video-Text Pre-training with Learned Regions.

[BibT_eX]

[DOI]

CoRR, 2021

AssistSR: Affordance-centric Question-driven Video Segment Retrieval.

[BibT_eX]

[DOI]

CoRR, 2021

AVA-AVD: Audio-visual Speaker Diarization in the Wild.

[BibT_eX]

[DOI]

CoRR, 2021

MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video.

[BibT_eX]

[DOI]

CoRR, 2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

CoRR, 2021

Generic Event Boundary Detection: A Benchmark for Event Segmentation.

[BibT_eX]

[DOI]

CoRR, 2021

Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Channel Augmented Joint Learning for Visible-Infrared Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Generic Event Boundary Detection: A Benchmark for Event Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Searching for Two-Stream Models in Multivariate Space for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

On Pursuit of Designing Multi-modal Transformer for Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization.

[BibT_eX]

[DOI]

CoRR, 2020

SF-Net: Single-Frame Supervision for Temporal Action Localization.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Deep Learning for Action Understanding in Video.

[BibT_eX]

[DOI]

PhD thesis, 2019

LPAT: Learning to Predict Adaptive Threshold for Weakly-supervised Temporal Action Localization.

[BibT_eX]

[DOI]

Xudong Lin

Shih-Fu Chang

CoRR, 2019

CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation.

[BibT_eX]

[DOI]

CoRR, 2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

AutoLoc: Weakly-supervised Temporal Action Localization.

[BibT_eX]

[DOI]

CoRR, 2018

Online Action Detection in Untrimmed, Streaming Videos - Modeling and Evaluation.

[BibT_eX]

[DOI]

CoRR, 2018

Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Online Detection of Action Start in Untrimmed, Streaming Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

2017

ConvNet Architecture Search for Spatiotemporal Feature Learning.

[BibT_eX]

[DOI]

CoRR, 2017

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

EventNet Version 1.1 Technical Report.

[BibT_eX]

[DOI]

CoRR, 2016

Action Temporal Localization in Untrimmed Videos via Multi-stage CNNs.

[BibT_eX]

[DOI]

Dongang Wang

Shih-Fu Chang

CoRR, 2016

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs.

[BibT_eX]

[DOI]