Yu Qiao

AgiBot-World-Contributors

CoRR, August, 2025

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

LIA-X: Interpretable Latent Portrait Animator.

[BibT_eX]

[DOI]

CoRR, August, 2025

LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation.

[BibT_eX]

[DOI]

CoRR, August, 2025

Building intelligence identification system via large language model watermarking: a survey and beyond.

[BibT_eX]

[DOI]

Artif. Intell. Rev., August, 2025

Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback.

[BibT_eX]

[DOI]

CoRR, July, 2025

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling.

[BibT_eX]

[DOI]

Victor Shea-Jay Huang

CoRR, July, 2025

Yume: An Interactive World Generation Model.

[BibT_eX]

[DOI]

CoRR, July, 2025

Re:Form - Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny.

[BibT_eX]

[DOI]

CoRR, July, 2025

Exploring Scalable Unified Modeling for General Low-Level Vision.

[BibT_eX]

[DOI]

CoRR, July, 2025

ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding.

[BibT_eX]

[DOI]

CoRR, July, 2025

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Big Data, June, 2025

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Sekai: A Video Dataset towards World Exploration.

[BibT_eX]

[DOI]

CoRR, June, 2025

DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces.

[BibT_eX]

[DOI]

CoRR, June, 2025

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost.

[BibT_eX]

[DOI]

CoRR, May, 2025

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings.

[BibT_eX]

[DOI]

CoRR, May, 2025

An Empirical Study of Federated Prompt Learning for Vision Language Model.

[BibT_eX]

[DOI]

CoRR, May, 2025

O<sup>2</sup>-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering.

[BibT_eX]

[DOI]

CoRR, May, 2025

GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling.

[BibT_eX]

[DOI]

CoRR, May, 2025

Demystify Transformers & Convolutions in Modern Image Deep Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving.

[BibT_eX]

[DOI]

CoRR, April, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, April, 2025

Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision.

[BibT_eX]

[DOI]

CoRR, April, 2025

Toward the unification of generative and discriminative visual foundation model: a survey.

[BibT_eX]

[DOI]

Vis. Comput., March, 2025

LVLM-EHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

LEO: Generative Latent Image Animator for Human Video Synthesis.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., March, 2025

ArchCAD-400K: An Open Large-Scale Architectural CAD Dataset and New Baseline for Panoptic Symbol Spotting.

[BibT_eX]

[DOI]

CoRR, March, 2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework.

[BibT_eX]

[DOI]

CoRR, March, 2025

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness.

[BibT_eX]

[DOI]

CoRR, March, 2025

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis.

[BibT_eX]

[DOI]

CoRR, March, 2025

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset.

[BibT_eX]

[DOI]

CoRR, March, 2025

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning.

[BibT_eX]

[DOI]

CoRR, March, 2025

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems.

[BibT_eX]

[DOI]

CoRR, March, 2025

An Egocentric Vision-Language Model based Portable Real-time Smart Assistant.

[BibT_eX]

[DOI]

CoRR, March, 2025

LimSim Series: An Autonomous Driving Simulation Platform for Validation and Enhancement.

[BibT_eX]

[DOI]

CoRR, February, 2025

Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT.

[BibT_eX]

[DOI]

CoRR, February, 2025

Predicting Issue Resolution Time of OSS Using Multiple Features.

[BibT_eX]

[DOI]

J. Softw. Evol. Process., January, 2025

WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages.

[BibT_eX]

[DOI]

CoRR, January, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.

[BibT_eX]

[DOI]

CoRR, January, 2025

DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency.

[BibT_eX]

[DOI]

CoRR, January, 2025

RepVideo: Rethinking Cross-Layer Representation for Video Generation.

[BibT_eX]

[DOI]

CoRR, January, 2025

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback.

[BibT_eX]

[DOI]

CoRR, January, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling.

[BibT_eX]

[DOI]

CoRR, January, 2025

Towards Efficient SDRTV-to-HDRTV by Learning From Image Formation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

LASP: Linear Attention Sequence Parallelism.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Latte: Latent Diffusion Transformer for Video Generation.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Fast 3D Room Layout Estimation Based on Compact High-Level Representation.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2025

B-AVIBench: Toward Evaluating the Robustness of Large Vision-Language Model on Black-Box Adversarial Visual-Instructions.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2025

Learning Discriminative Representations in Videos via Active Embedding Distance Correlation.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

Eye-SCAN: Eye-Movement-Attention-based Spatial Channel Adaptive Network for traffic accident prediction.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

Percept, Chat, Adapt: Knowledge transfer of foundation models for open-world video recognition.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

A-Eval: A benchmark for cross-dataset and cross-modality evaluation of abdominal multi-organ segmentation.

[BibT_eX]

[DOI]

Medical Image Anal., 2025

Driver Cognitive Distraction Detection based on eye movement behavior and integration of multi-view space-channel feature.

[BibT_eX]

[DOI]

Expert Syst. Appl., 2025

Exploring Contextual Priors for Real-World Image Super-Resolution.

[BibT_eX]

[DOI]

Shixiang Wu

Comput. Vis. Media, 2025

VideoChat: chat-centric video understanding.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2025

Cut2Next: Generating Next Shot via In-Context Tuning.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2025 Conference Papers, 2025

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ArchCAD-400K: A Large-Scale CAD drawings Dataset and New Baseline for Panoptic Symbol Spotting.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

DiffusionMat: Alpha Matting as Deterministic Sequential Refinement Learning.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, 2025

RH20T-P: A Primitive-Level Robotic Manipulation Dataset towards Composable Generalization Agents in Real-world Scenarios.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

An Empirical Study of Federated Prompt Learning for Vision Language Model.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

I-Lora: Iterative Merging of Routing-Tuned Low-Rank Adapters for Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Localization Hints Exploration for Object Matting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

An Intelligent Agentic System for Complex Image Restoration Problems.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

REEF: Representation Encoding Fingerprints for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OS-ATLAS: Foundation Action Model for Generalist GUI Agents.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Learning Causal Alignment for Reliable Disease Diagnosis.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DriveArena: A Closed-Loop Generative Simulation Platform for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Lumina-Image 2.0: a Unified and Efficient Image Generative Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Dual-Expert Consistency Model for Efficient and High-Quality Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SecRAG: A Graph-Enhanced RAG Framework with Dynamic Prompt for Cybersecurity Applications.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Computer Supported Cooperative Work in Design, 2025

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Vision-Centric BEV Perception: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., December, 2024

Chinese CSUQ: Cross-Cultural Adaptation and Evaluation of Measurement Properties.

[BibT_eX]

[DOI]

Int. J. Hum. Comput. Interact., November, 2024

Diff-Font: Diffusion Model for Robust One-Shot Font Generation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., November, 2024

An ecology-oriented convergence evolution analysis method of crossover service ecosystems.

[BibT_eX]

[DOI]

J. Softw. Evol. Process., July, 2024

F2S-Net: learning frame-to-segment prediction for online action detection.

[BibT_eX]

[DOI]

Yi Liu

J. Real Time Image Process., May, 2024

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2024

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Temporally consistent video colorization with deep feature propagation and self-regularization learning.

[BibT_eX]

[DOI]

Comput. Vis. Media, April, 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., February, 2024

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.

[BibT_eX]

[DOI]

Vis. Intell., 2024

CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud Learning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Dual Masked Modeling for Weakly-Supervised Temporal Boundary Discovery.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Attentive Snippet Prompting for Video Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

AdaptBIR: Adaptive Blind Image Restoration with latent diffusion prior for higher fidelity.

[BibT_eX]

[DOI]

Pattern Recognit., 2024

MixStyle Neural Networks for Domain Generalization and Adaptation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2024

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.

[BibT_eX]

[DOI]

CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation.

[BibT_eX]

[DOI]

CoRR, 2024

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI.

[BibT_eX]

[DOI]

CoRR, 2024

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models.

[BibT_eX]

[DOI]

CoRR, 2024

OASIS: Open Agent Social Interaction Simulations with One Million Agents.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes.

[BibT_eX]

[DOI]

CoRR, 2024

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.

[BibT_eX]

[DOI]

CoRR, 2024

Diffusion Transformer Policy.

[BibT_eX]

[DOI]

CoRR, 2024

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues.

[BibT_eX]

[DOI]

CoRR, 2024

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.

[BibT_eX]

[DOI]

CoRR, 2024

ToMiE: Towards Modular Growth in Enhanced SMPL Skeleton for 3D Human with Animatable Garments.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

MinerU: An Open-Source Solution for Precise Document Content Extraction.

[BibT_eX]

[DOI]

CoRR, 2024

CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation.

[BibT_eX]

[DOI]

CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction.

[BibT_eX]

[DOI]

CoRR, 2024

A Preliminary Exploration Towards General Image Restoration.

[BibT_eX]

[DOI]

CoRR, 2024

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.

[BibT_eX]

[DOI]

CoRR, 2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2024

The Shadow of Fraud: The Emerging Danger of AI-powered Social Engineering and its Possible Cure.

[BibT_eX]

[DOI]

CoRR, 2024

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

ViLLa: Video Reasoning Segmentation with Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

The Better Angels of Machine Personality: How Personality Relates to LLM Safety.

[BibT_eX]

[DOI]

CoRR, 2024

Navigating the Data Trading Crossroads: An Interdisciplinary Survey.

[BibT_eX]

[DOI]

CoRR, 2024

Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond.

[BibT_eX]

[DOI]

CoRR, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

GRUtopia: Dream General Robots in a City at Scale.

[BibT_eX]

[DOI]

CoRR, 2024

VEnhancer: Generative Space-Time Enhancement for Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

CoRR, 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation.

[BibT_eX]

[DOI]

CoRR, 2024

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models.

[BibT_eX]

[DOI]

CoRR, 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.

[BibT_eX]

[DOI]

CoRR, 2024

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

FLoRA: Low-Rank Core Space for N-dimension.

[BibT_eX]

[DOI]

CoRR, 2024

UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Causal Evaluation of Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Linear Attention Sequence Parallelism.

[BibT_eX]

[DOI]

CoRR, 2024

VideoDistill: Language-aware Vision Distillation for Video Question Answering.

[BibT_eX]

[DOI]

CoRR, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?

[BibT_eX]

[DOI]

CoRR, 2024

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents.

[BibT_eX]

[DOI]

CoRR, 2024

Assessment of Multimodal Large Language Models in Alignment with Human Values.

[BibT_eX]

[DOI]

CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control.

[BibT_eX]

[DOI]

CoRR, 2024

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring Safety Generalization Challenges of Large Language Models via Code.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Implicit Prompt For Text-To-Image Models.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Action Counting with Dynamic Queries.

[BibT_eX]

[DOI]

CoRR, 2024

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset.

[BibT_eX]

[DOI]

CoRR, 2024

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation.

[BibT_eX]

[DOI]

CoRR, 2024

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Real-time Holistic Robot Pose Estimation with Unknown States.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.

[BibT_eX]

[DOI]

CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.

[BibT_eX]

[DOI]

CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.

[BibT_eX]

[DOI]

CoRR, 2024

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

Code Reviewer Recommendation Based on a Hypergraph with Multiplex Relationships.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Software Analysis, 2024

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2024

Learning Manipulation by Predicting Interaction.

[BibT_eX]

[DOI]

Proceedings of the Robotics: Science and Systems XX, 2024

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Parameter-Inverted Image Pyramid Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

SyncVIS: Synchronized Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

4Diffusion: Multi-view Video Diffusion Model for 4D Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Needle In A Multimodal Haystack.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Learning 1D Causal Visual Representation with De-focus Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Fake Alignment: Are LLMs Really Aligned Well?

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Learning A Low-Level Vision Generalist via Visual Task Prompt.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

Safety of Multimodal Large Language Models on Images and Text.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Driver Cognitive Distraction Detection Based on Eye Movement Behavior and Spatio-Temporal Information Fusion.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 31st International Conference, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Position: Towards Implicit Prompt For Text-To-Image Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Unifying Image Processing as Visual Prompting Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Causal Discovery via Conditional Independence Testing with Proxy Variables.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

CO2: Efficient Distributed Training with Full Communication-Computation Overlap.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling, 2024

LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Inference-Time Language Model Alignment via Integrated Value Guidance.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Embodied Understanding of Driving Scenarios.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Reg-TTA3D: Better Regression Makes Better Test-Time Adaptive 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation.

[BibT_eX]

[DOI]

Yuchen Yang

Xiao Sun

Proceedings of the Computer Vision - ECCV 2024, 2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SAM-Med3D: Towards General-Purpose Segmentation Models for Volumetric Medical Images.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ControlLLM: Augment Language Models with Tools by Searching on Graphs.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Distilling Knowledge from Large-Scale Image Models for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Within the Dynamic Context: Inertia-Aware 3D Human Modeling with Pose Sequence.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

A Comparative Study of Image Restoration Networks for General Backbone Network Design.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Language-aware Visual Semantic Distillation for Video Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Vlogger: Make Your Dream A Vlog.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Asymmetric Masked Distillation for Pre-Training Small Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-End Oriented Object Detection with Single Point Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Generalized Predictive Model for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SinSR: Diffusion-Based Image Super-Resolution in a Single Step.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VBench: Comprehensive Benchmark Suite for Video Generative Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OneLLM: One Framework to Align All Modalities with Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Real-world Video Face Restoration: A New Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoBooth: Diffusion-based Video Generation with Image Prompts.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Point Transformer V3: Simpler, Faster, Stronger.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation.

[BibT_eX]

[DOI]

Yan Ma

Pengfei Liu

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Critic-Guided Decision Transformer for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

ConditionVideo: Training-Free Condition-Guided Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

M-BEV: Masked BEV Perception for Robust Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Evaluating the Generalization Ability of Super-Resolution Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

Hybrid token transformer for deep face recognition.

[BibT_eX]

[DOI]

Pattern Recognit., July, 2023

Blind Image Super-Resolution: A Survey and Beyond.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

Goal model convergence and conflict detection for crossover services.

[BibT_eX]

[DOI]

J. Syst. Softw., May, 2023

COCAS+: Large-Scale Clothes-Changing Person Re-Identification With Clothes Templates.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., April, 2023

Domain Generalization: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2023

ActFloor-GAN: Activity-Guided Adversarial Networks for Human-Centric Floorplan Design.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., March, 2023

Towards robustness and generalization of point cloud representation: A geometry coding method and a large-scale object-level dataset.

[BibT_eX]

[DOI]

Comput. Vis. Media, February, 2023

Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments.

[BibT_eX]

[DOI]

Briefings Bioinform., January, 2023

Hierarchical and Progressive Image Matting.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2023

Blind Image Restoration Based on Cycle-Consistent Network.

[BibT_eX]

[DOI]

Shixiang Wu

IEEE Trans. Multim., 2023

Region-Aware Arbitrary-Shaped Text Detection With Progressive Fusion.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Very Lightweight Photo Retouching Network With Conditional Sequential Modulation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Character-Aware Sampling and Rectification for Scene Text Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Dual Relation Network for Scene Text Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on Prompt Engineering Strategies.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Knowledge-driven Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future.

[BibT_eX]

[DOI]

CoRR, 2023

MLLMs-Augmented Visual-Language Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Query-Relevant Images Jailbreak Large Multi-Modal Models.

[BibT_eX]

[DOI]

CoRR, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision.

[BibT_eX]

[DOI]

CoRR, 2023

DiffusionMat: Alpha Matting as Sequential Refinement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks.

[BibT_eX]

[DOI]

CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Octavius: Mitigating Task Interference in MLLMs via MoE.

[BibT_eX]

[DOI]

CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.

[BibT_eX]

[DOI]

CoRR, 2023

ControlLLM: Augment Language Models with Tools by Searching on Graphs.

[BibT_eX]

[DOI]

CoRR, 2023

SAM-Med3D.

[BibT_eX]

[DOI]

CoRR, 2023

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm.

[BibT_eX]

[DOI]

CoRR, 2023

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets.

[BibT_eX]

[DOI]

CoRR, 2023

Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face.

[BibT_eX]

[DOI]

CoRR, 2023

Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Exploring Counterfactual Alignment Loss towards Human-centered AI.

[BibT_eX]

[DOI]

CoRR, 2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.

[BibT_eX]

[DOI]

CoRR, 2023

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

A-Eval: A Benchmark for Cross-Dataset Evaluation of Abdominal Multi-Organ Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution.

[BibT_eX]

[DOI]

CoRR, 2023

SAM-Med2D.

[BibT_eX]

[DOI]

CoRR, 2023

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior.

[BibT_eX]

[DOI]

CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.

[BibT_eX]

[DOI]

CoRR, 2023

Scaling TransNormer to 175 Billion Parameters.

[BibT_eX]

[DOI]

CoRR, 2023

Meta-Transformer: A Unified Framework for Multimodal Learning.

[BibT_eX]

[DOI]

CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

Align, Adapt and Inject: Sound-guided Unified Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification.

[BibT_eX]

[DOI]

CoRR, 2023

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling.

[BibT_eX]

[DOI]

CoRR, 2023

DiffRoom: Diffusion-based High-Quality 3D Room Reconstruction and Generation with Occupancy Prior.

[BibT_eX]

[DOI]

CoRR, 2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.

[BibT_eX]

[DOI]

CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

Causal Discovery with Unobserved Variables: A Proxy Variable Approach.

[BibT_eX]

[DOI]

CoRR, 2023

LEO: Generative Latent Image Animator for Human Video Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles.

[BibT_eX]

[DOI]

CoRR, 2023

STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training.

[BibT_eX]

[DOI]

CoRR, 2023

Topology Reasoning for Driving Scenes.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.

[BibT_eX]

[DOI]

CoRR, 2023

Aleth-NeRF: Low-light Condition View Synthesis with Concealing Fields.

[BibT_eX]

[DOI]

CoRR, 2023

FCN+: Global Receptive Convolution Makes FCN Great Again.

[BibT_eX]

[DOI]

CoRR, 2023

Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling.

[BibT_eX]

[DOI]

CoRR, 2023

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Real-World Image Super-Resolution as Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Foundation Model is Efficient Multimodal Multitask Model Selector.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Networks are Slacking Off: Understanding Generalization Problem in Image Deraining.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning Discriminative Feature Representation for Open Set Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Text-Guided Foundation Model Adaptation for Pathological Image Classification.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

LimSim: A Long-Term Interactive Multi-Scenario Traffic Simulator.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on Intelligent Transportation Systems, 2023

Parallelizable Simple Recurrent Units with Hierarchical Memory.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 30th International Conference, 2023

Long-Term Rhythmic Video Soundtracker.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Vision Transformer Adapter for Dense Predictions.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scaling Data Generation in Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Multi-view Spectral Polarization Propagation for Video Glass Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Rethinking Range View Representation for LiDAR Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MGMAE: Motion Guided Masking for Video Masked Autoencoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Stare at What You See: Masked Image Modeling without Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SCPNet: Semantic Scene Completion on Point Cloud.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ResFormer: Scaling ViTs with Multi-Resolution Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Siamese Image Modeling for Self-Supervised Vision Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Fine-grained Audible Video Description.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Planning-oriented Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Neural Transformation Fields for Arbitrary-Styled Font Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Activating More Pixels in Image Super-Resolution Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DegAE: A New Pretraining Paradigm for Low-Level Vision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

OpenICL: An Open-Source Framework for In-context Learning.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

Improving Training and Inference of Face Recognition Models via Random Temperature Scaling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

A method of value model convergence and profit optimization for crossover services.

[BibT_eX]

[DOI]

J. King Saud Univ. Comput. Inf. Sci., November, 2022

Prior-Induced Information Alignment for Image Matting.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Robust Image Forgery Detection Against Transmission Over Online Social Networks.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2022

Temporal Weighting Appearance-Aligned Network for Nighttime Video Retrieval.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

Unsupervised person re-identification with multi-label learning guided self-paced clustering.

[BibT_eX]

[DOI]

Pattern Recognit., 2022

RankSRGAN: Super Resolution Generative Adversarial Networks With Learning to Rank.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Interactive Multi-Dimension Modulation for Image Restoration.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Author Correction: Development and clinical deployment of a smartphone-based visual field deep learning system for glaucoma detection.

[BibT_eX]

[DOI]

npj Digit. Medicine, 2022

Joint 3D facial shape reconstruction and texture completion from a single image.

[BibT_eX]

[DOI]

Comput. Vis. Media, 2022

ADAS: A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2022

Goal-oriented Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE.

[BibT_eX]

[DOI]

CoRR, 2022

Improving Training and Inference of Face Recognition Models via Random Temperature Scaling.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.

[BibT_eX]

[DOI]

CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.

[BibT_eX]

[DOI]

CoRR, 2022

Demystify Transformers & Convolutions in Modern Image Deep Networks.

[BibT_eX]

[DOI]

CoRR, 2022

Hierarchical and Progressive Image Matting.

[BibT_eX]

[DOI]

CoRR, 2022

Low-Resolution Action Recognition for Tiny Actions Challenge.

[BibT_eX]

[DOI]

Boyu Chen

CoRR, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.

[BibT_eX]

[DOI]

CoRR, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.

[BibT_eX]

[DOI]

CoRR, 2022

Vision-Centric BEV Perception: A Survey.

[BibT_eX]

[DOI]

CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.

[BibT_eX]

[DOI]

CoRR, 2022

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm.

[BibT_eX]

[DOI]

CoRR, 2022

Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot.

[BibT_eX]

[DOI]

CoRR, 2022

Siamese Image Modeling for Self-Supervised Vision Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Illumination Adaptive Transformer.

[BibT_eX]

[DOI]

CoRR, 2022

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results.

[BibT_eX]

[DOI]

et al.

CoRR, 2022

ConvMAE: Masked Convolution Meets Masked Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

An empirical study on temporal modeling for online action detection.

[BibT_eX]

[DOI]

Complex Intell. Syst., 2022

Asynchronous feature regularization and cross-modal distillation for OCT based glaucoma diagnosis.

[BibT_eX]

[DOI]

Comput. Biol. Medicine, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MCMAE: Masked Convolution Meets Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Cycle-Consistent Learning for Weakly Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the HCMA@MM 2022: Proceedings of the 3rd International Workshop on Human-Centric Multimedia Analysis, 2022

Visual Knowledge Graph for Human Action Reasoning in Videos.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Pattern Recognition, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Self-slimmed Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Efficient Image Super-Resolution Using Vast-Receptive-Field Attention.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Recurrent Bilinear Optimization for Binary Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2022

PalGAN: Image Colorization with Palette Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Frozen CLIP Models are Efficient Video Learners.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Blueprint Separable Residual Network for Efficient Image Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Reflash Dropout in Image Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross Domain Object Detection by Target-Perceived Dual Branch Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2022

Wider and Higher: Intensive Integration and Global Foreground Perception for Image Matting.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Graphics, 2022

Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

CPRAL: Collaborative Panoptic-Regional Active Learning for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Smart Scribbles for Image Matting.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2021

Wildfish++: A Comprehensive Fish Benchmark for Multimedia Research.

[BibT_eX]

[DOI]

Peiqin Zhuang

Francisco Gómez Fernández

IEEE Trans. Multim., 2021

Deep Relation Transformer for Diagnosing Glaucoma With Optical Coherence Tomography and Visual Field Function.

[BibT_eX]

[DOI]

IEEE Trans. Medical Imaging, 2021

Domain Adaptive Ensemble Learning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

Deep Learning-Based Chroma Prediction for Intra Versatile Video Coding.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2021

Multi-view self-supervised learning for 3D facial texture reconstruction from single image.

[BibT_eX]

[DOI]

Image Vis. Comput., 2021

TTPP: Temporal Transformer with Progressive Prediction for efficient action anticipation.

[BibT_eX]

[DOI]

Neurocomputing, 2021

A Comprehensive Review of Group Activity Recognition in Videos.

[BibT_eX]

[DOI]

Int. J. Autom. Comput., 2021

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results.

[BibT_eX]

[DOI]

Qinlong Wang

Yang Yang

CoRR, 2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2021

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video.

[BibT_eX]

[DOI]

CoRR, 2021

INTERN: A New Learning Paradigm Towards General Vision.

[BibT_eX]

[DOI]

CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Discovering "Semantics" in Super-Resolution Networks.

[BibT_eX]

[DOI]

CoRR, 2021

Transferable Knowledge-Based Multi-Granularity Aggregation Network for Temporal Action Localization: Submission to ActivityNet Challenge 2021.

[BibT_eX]

[DOI]

CoRR, 2021

Scalable Transformers for Neural Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2021

TSI: Temporal Saliency Integration for Video Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification.

[BibT_eX]

[DOI]

CoRR, 2021

FineAction: A Fined Video Dataset for Temporal Action Localization.

[BibT_eX]

[DOI]

CoRR, 2021

Neighbourhood-guided Feature Reconstruction for Occluded Person Re-Identification.

[BibT_eX]

[DOI]

CoRR, 2021

NTIRE 2021 Challenge on Perceptual Image Quality Assessment.

[BibT_eX]

[DOI]

Seyed Mehdi Ayyoubzadeh

CoRR, 2021

Smart Scribbles for Image Mating.

[BibT_eX]

[DOI]

CoRR, 2021

Self-speculation of clinical features based on knowledge distillation for accurate ocular disease classification.

[BibT_eX]

[DOI]

Biomed. Signal Process. Control., 2021

Multi-label ocular disease classification with a dense correlation deep neural network.

[BibT_eX]

[DOI]

Biomed. Signal Process. Control., 2021

Group Shift Pointwise Convolution for Volumetric Medical Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

A Novel Hybrid Convolutional Neural Network for Accurate Organ Segmentation in 3D Head and Neck CT Images.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Collaborative Multi-View Convolutions With Gating For Accurate And Fast Volumetric Medical Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on Biomedical Imaging, 2021

Domain Generalization with MixStyle.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

CT-Net: Channel Tensorization Network for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Digging into Uncertainty in Self-supervised Multi-view Stereo.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Tripartite Information Mining and Integration for Image Matting.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

A New Journey from SDRTV to HDRTV.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Temporal Context Aggregation Network for Temporal Action Proposal Refinement.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Detecting Human-Object Interaction via Fabricated Compositional Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Affordance Transfer Learning for Human-Object Interaction Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

NTIRE 2021 Challenge on Perceptual Image Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

HDRUNet: Single Image HDR Reconstruction With Denoising and Dequantization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Toward Interactive Modulation for Photo-Realistic Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Investigate Indistinguishable Points in Semantic Segmentation of 3D Point Cloud.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

MP-Mono: Monocular 3D Detection Using Multiple Priors for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2021

2020

FeatherCNN: Fast Inference Computation with TensorGEMM on ARM Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Progressive Object Transfer Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

DID: Disentangling-Imprinting-Distilling for Continuous Low-Shot Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Learning label correlations for multi-label image recognition with graph networks.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2020

Development and clinical deployment of a smartphone-based visual field deep learning system for glaucoma detection.

[BibT_eX]

[DOI]

npj Digit. Medicine, 2020

Finding hard faces with better proposals and classifier.

[BibT_eX]

[DOI]

Mach. Vis. Appl., 2020

A Value-Driven Modeling Approach for Crossover Services.

[BibT_eX]

[DOI]

Int. J. Web Serv. Res., 2020

Cascade multi-head attention networks for action recognition.

[BibT_eX]

[DOI]

Jiaze Wang

Pablo Navarrete Michelini

Comput. Vis. Image Underst., 2020

Product image recognition with guidance learning and noisy supervision.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2020

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Exploring Multi-Scale Feature Propagation and Communication for Image Super Resolution.

[BibT_eX]

[DOI]

CoRR, 2020

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units.

[BibT_eX]

[DOI]

CoRR, 2020

A Comprehensive Study on Temporal Modeling for Online Action Detection.

[BibT_eX]

[DOI]

CoRR, 2020

Multi-scale Information Assembly for Image Matting.

[BibT_eX]

[DOI]

Comput. Graph. Forum, 2020

SIAT-3DFE: A High-Resolution 3D Facial Expression Dataset.

[BibT_eX]

[DOI]

IEEE Access, 2020

Dense Correlation Network for Automated Multi-Label Ocular Disease Detection with Paired Color Fundus Photographs.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Symposium on Biomedical Imaging, 2020

Classification of Ocular Diseases Employing Attention-Based Unilateral and Bilateral Feature Weighting and Fusion.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Symposium on Biomedical Imaging, 2020

Learning Discriminative Representation For Facial Expression Recognition From Uncertainties.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2020

Efficient Image Super-Resolution Using Pixel Attention.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Attention-Driven Dynamic Graph Convolutional Network for Multi-label Image Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

AIM 2020 Challenge on Video Temporal Super-Resolution.

[BibT_eX]

[DOI]

Kazutoshi Akita

Norimichi Ukita

Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Suppressing Mislabeled Data via Grouping and Self-attention.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Enhanced Quadratic Video Interpolation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Learning to Predict Context-Adaptive Convolution for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Visual Compositional Learning for Human-Object Interaction Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Conditional Sequential Modulation for Efficient Global Image Retouching.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Interactive Multi-dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration.

[BibT_eX]

[DOI]

Jingwen He

Proceedings of the Computer Vision - ECCV 2020, 2020

Mining Inter-Video Proposal Relations for Video Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Suppressing Uncertainties for Large-Scale Facial Expression Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Fast Texture Synthesis via Pseudo Optimizer.

[BibT_eX]

[DOI]

Wu Shi

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Attention-Guided Hierarchical Structure Aggregation for Image Matting.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

SmallBigNet: Integrating Core and Contextual Views for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Multiple Transfer Learning and Multi-label Balanced Training Strategies for Facial AU Detection In the Wild.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Adaptive Dilated Network With Self-Correction Supervision for Counting.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

aDMSCN: A Novel Perspective for User Intent Prediction in Customer Service Bots.

[BibT_eX]

[DOI]

Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

Learning Attentive Pairwise Interaction for Fine-Grained Classification.

[BibT_eX]

[DOI]

Peiqin Zhuang

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Context-Transformer: Tackling Object Confusion for Few-Shot Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Geometry Sharing Network for 3D Point Cloud Classification and Segmentation.

[BibT_eX]

[DOI]

Mingye Xu

Zhipeng Zhou

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Pose-Assisted Multi-Camera Collaboration for Active Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Dynamic Sampling Network for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

FD-GAN: Generative Adversarial Networks with Fusion-Discriminator for Single Image Dehazing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

A Process Convergence Approach for Crossover Services based on Message Flow Partition and Merging.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Services Computing, 2020

2019

Mutual Component Convolutional Neural Networks for Heterogeneous Face Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

A Literature Review: Geometric Methods and Their Applications in Human-Related Analysis.

[BibT_eX]

[DOI]

Sensors, 2019

Dual-supervised attention network for deep cross-modal hashing.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2019

Temporal Segment Networks for Action Recognition in Videos.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2019

DeepDeblur: text image recovery from blur to sharp.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2019

Pedestrian detection with unsupervised multispectral feature learning using deep neural networks.

[BibT_eX]

[DOI]

Inf. Fusion, 2019

A Comprehensive Study on Center Loss for Deep Face Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2019

Multi-Dimension Modulation for Image Restoration with Dynamic Controllable Residual Learning.

[BibT_eX]

[DOI]

Jingwen He

CoRR, 2019

Learning Category Correlations for Multi-label Image Recognition with Graph Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Product Image Recognition with Guidance Learning and Noisy Supervision.

[BibT_eX]

[DOI]

CoRR, 2019

Correction to: Automatic differentiation of Glaucoma visual field from non-glaucoma visual field using deep convolutional neural network.

[BibT_eX]

[DOI]

BMC Medical Imaging, 2019

Robust Text Line Detection in Equipment Nameplate Images.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics, 2019

The Equipment Nameplate Dataset for Scene Text Detection and Recognition<sup>∗</sup>.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics, 2019

Orientation Robust Scene Text Recognition in Natural Scene.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics, 2019

AnoPCN: Video Anomaly Detection via Deep Predictive Coding Network.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Intelligent Glaucoma Diagnosis Via Active Learning And Adversarial Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 16th IEEE International Symposium on Biomedical Imaging, 2019

Prostate Segmentation using 2D Bridged U-net.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2019

Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2019

Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2019

Exploring Regularizations with Face, Body and Image Cues for Group Cohesion Prediction.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2019

Visual-Textual Sentiment Analysis in Product Reviews.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Frame Attention Networks for Facial Expression Recognition in Videos.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

RankSRGAN: Generative Adversarial Networks With Ranker for Image Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction.

[BibT_eX]

[DOI]

Xiaoxing Zeng

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Dynamic Multi-Scale Filters for Semantic Segmentation.

[BibT_eX]

[DOI]

Junjun He

Zhongying Deng

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition.

[BibT_eX]

[DOI]

Weihe Zhang

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

P2SGrad: Refined Gradients for Optimizing Deep Face Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

PA3D: Pose-Action 3D Machine for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Adaptive Pyramid Context Network for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Modulating Image Restoration With Continual Levels via Adaptive Feature Modification Layers.

[BibT_eX]

[DOI]

Jingwen He

Pablo Navarrete Michelini

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Suppressing Model Overfitting for Image Super-Resolution Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

NTIRE 2019 Challenge on Real Image Super-Resolution: Methods and Results.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Residual Compensation Networks for Heterogeneous Face Recognition.

[BibT_eX]

[DOI]

Zhongying Deng

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Real-Time Action Recognition With Deeply Transferred Motion Vector CNNs.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2018

Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.

[BibT_eX]

[DOI]

Wenbin Du

IEEE Trans. Image Process., 2018

Deep embedding convolutional neural network for synthesizing CT image from T1-Weighted MR image.

[BibT_eX]

[DOI]

Medical Image Anal., 2018

Transferring Deep Object and Scene Representations for Event Recognition in Still Images.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2018

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.

[BibT_eX]

[DOI]

CoRR, 2018

W-net: Bridged U-net for 2D Medical Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2018

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering.

[BibT_eX]

[DOI]

CoRR, 2018

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward.

[BibT_eX]

[DOI]

Kaiyang Zhou

CoRR, 2018

Automatic differentiation of Glaucoma visual field from non-glaucoma visual filed using deep convolutional neural network.

[BibT_eX]

[DOI]

BMC Medical Imaging, 2018

Structured Triplet Learning with POS-Tag Guided Attention for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

WildFish: A Large Benchmark for Fish Recognition in the Wild.

[BibT_eX]

[DOI]

Peiqin Zhuang

Pablo Navarrete Michelini

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

StripNet: Towards Topology Consistent Strip Structure Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Visual Field Based Automatic Diagnosis of Glaucoma Using Deep Convolutional Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Computational Pathology and Ophthalmic Medical Image Analysis, 2018

A Multi-task Learning Approach for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction.

[BibT_eX]

[DOI]

Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues.

[BibT_eX]

[DOI]

Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

Super-Identity Convolutional Neural Network for Face Hallucination.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Find and Focus: Retrieve and Localize Video Events with Natural Language Queries.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Temporal Hallucinating for Action Recognition With Few Still Images.

[BibT_eX]

[DOI]

Lei Zhou

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

FOTS: Fast Oriented Text Spotting With a Unified Network.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

An End-to-End TextSpotter With Explicit Alignment and Attention.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

RDS-Denoiser: a Detail-preserving Convolutional Neural Network for Image Denoising.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cyborg and Bionic Systems, 2018

Boosting up Scene Text Detectors with Guided CNN.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference 2018, 2018

Deep Reinforcement Learning for Unsupervised Video Summarization With Diversity-Representativeness Reward.

[BibT_eX]

[DOI]

Kaiyang Zhou

Tao Xiang

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

LSTD: A Low-Shot Transfer Detector for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2017

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2017

Locally Supervised Deep Hybrid Model for Scene Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2017

Improving scale invariant feature transform with local color contrastive descriptor for image classification.

[BibT_eX]

[DOI]

Sheng Guo

J. Electronic Imaging, 2017

A robust coherent point drift approach based on rotation invariant shape context.

[BibT_eX]

[DOI]

Neurocomputing, 2017

Deep auto-context convolutional neural networks for standard-dose PET image estimation from low-dose PET/MRI.

[BibT_eX]

[DOI]

Neurocomputing, 2017

Learning multiple local binary descriptors for image matching.

[BibT_eX]

[DOI]

Yongqiang Gao

Neurocomputing, 2017

Deep Embedding Convolutional Neural Network for Synthesizing CT Image from T1-Weighted MR Image.

[BibT_eX]

[DOI]

CoRR, 2017

Group emotion recognition with individual facial emotion CNNs and global image based CNNs.

[BibT_eX]

[DOI]

Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

Depth driven people counting using deep region proposal network.

[BibT_eX]

[DOI]

Diping Song

Alessandro Corbetta

Proceedings of the IEEE International Conference on Information and Automation, 2017

Detecting Faces Using Inside Cascaded Contextual CNN.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Range Loss for Deep Face Recognition with Long-Tailed Training Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Single Shot Text Detector with Regional Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos.

[BibT_eX]

[DOI]

Wenbin Du

Proceedings of the IEEE International Conference on Computer Vision, 2017

NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

Marine Animal Detection and Recognition with Advanced Deep Learning Models.

[BibT_eX]

[DOI]

Proceedings of the Working Notes of CLEF 2017, 2017

Dual Learning for Cross-domain Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

Orientation-Aware Text Proposals Network for Scene Text Detection.

[BibT_eX]

[DOI]

Proceedings of the Biometric Recognition - 12th Chinese Conference, 2017

Sparse Deep Transfer Learning for Convolutional Neural Network.

[BibT_eX]

[DOI]

Jiaming Liu

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

Bridging Music and Image via Cross-Modal Ranking Analysis.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2016

Text-Attentional Convolutional Neural Network for Scene Text Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2016

Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2016

Adaptive Part-Level Model Knowledge Transfer for Gender Classification.

[BibT_eX]

[DOI]

Yongqiang Gao

Zhifeng Li

IEEE Signal Process. Lett., 2016

MoFAP: A Multi-level Representation for Action Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2016

Reference-omitted affine soft correspondence algorithm.

[BibT_eX]

[DOI]

IET Image Process., 2016

Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2016

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks.

[BibT_eX]

[DOI]

CoRR, 2016

Range Loss for Deep Face Recognition with Long-tail.

[BibT_eX]

[DOI]

CoRR, 2016

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016.

[BibT_eX]

[DOI]

CoRR, 2016

Transferring Object-Scene Convolutional Neural Networks for Event Recognition in Still Images.

[BibT_eX]

[DOI]

CoRR, 2016

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network.

[BibT_eX]

[DOI]

CoRR, 2016

Locally-Supervised Deep Hybrid Model for Scene Recognition.

[BibT_eX]

[DOI]

Sheng Guo

CoRR, 2016

Shenzhen Institutes of Advanced Technology, CAS, China at TRECVID INS 2016.

[BibT_eX]

[DOI]

Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Deep rehabilitation gait learning for modeling knee joints of lower-limb exoskeleton.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Robotics and Biomimetics, 2016

Deep face attributes recognition using spatial transformer network.

[BibT_eX]

[DOI]

Lianzhi Tan

Zhifeng Li

Proceedings of the IEEE International Conference on Information and Automation, 2016

DeepWriter: A Multi-stream Deep CNN for Text-Independent Writer Identification.

[BibT_eX]

[DOI]

Linjie Xing

Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016

Codebook enhancement of vlad representation for visual recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Human action recognition with DeepAction Kernel Gaussian Process.

[BibT_eX]

[DOI]

Lin Li

Proceedings of the 2016 International Conference on Advanced Robotics and Mechatronics, 2016

A Discriminative Feature Learning Approach for Deep Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Detecting Text in Natural Image with Connectionist Text Proposal Network.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

A Key Volume Mining Deep Framework for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Real-Time Action Recognition with Enhanced Motion Vector CNNs.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Gender and Smile Classification Using Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016

Latent Factor Guided Convolutional Neural Networks for Age-Invariant Face Recognition.

[BibT_eX]

[DOI]

Yandong Wen

Zhifeng Li

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Actionness Estimation Using Hybrid Fully Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Reading Scene Text in Deep Convolutional Sequences.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

Local Multi-Grouped Binary Descriptor With Ring-Based Pooling Configuration and Optimization.

[BibT_eX]

[DOI]

Yongqiang Gao

IEEE Trans. Image Process., 2015

On feature-specific parameter learning in conditional random field-based approach for interactive object segmentation.

[BibT_eX]

[DOI]

J. Electronic Imaging, 2015

Towards Good Practices for Very Deep Two-Stream ConvNets.

[BibT_eX]

[DOI]

CoRR, 2015

Object-Scene Convolutional Neural Networks for Event Recognition in Images.

[BibT_eX]

[DOI]

CoRR, 2015

Places205-VGGNet Models for Scene Recognition.

[BibT_eX]

[DOI]

CoRR, 2015

Text-Attentional Convolutional Neural Networks for Scene Text Detection.

[BibT_eX]

[DOI]

CoRR, 2015

Local Color Contrastive Descriptor for Image Classification.

[BibT_eX]

[DOI]

Sheng Guo

CoRR, 2015

Boosting Optical Character Recognition: A Super-Resolution Approach.

[BibT_eX]

[DOI]

CoRR, 2015

Deep classification of vehicle makers and models: The effectiveness of pre-training and data enhancement.

[BibT_eX]

[DOI]

Feiyun Zhang

Xiao Xu

Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics, 2015

Road segmentation via iterative deep analysis.

[BibT_eX]

[DOI]

Xiang Chen

Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics, 2015

Fast single image dehazing through Edge-Guided Interpolated Filter.

[BibT_eX]

[DOI]

Ximei Zhu

Ying Li

Proceedings of the 14th IAPR International Conference on Machine Vision Applications, 2015

MIL: Music Exploration and Visualization via Lyric and Image.

[BibT_eX]

[DOI]

Xixuan Wu

Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Better Exploiting OS-CNNs for Better Event Recognition in Images.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop, 2015

Object-Scene Convolutional Neural Networks for event recognition in images.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015

Exploring Fisher vector and deep networks for action spotting.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015

Action recognition with trajectory-pooled deep-convolutional descriptors.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014

Latent Hierarchical Model of Temporal Structure for Complex Activity Classification.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2014

Common Feature Discriminant Analysis for Matching Infrared Face Images to Optical Face Images.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2014

Large Margin Dimensionality Reduction for Action Similarity Labeling.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2014

Bayesian salient object detection based on saliency driven clustering.

[BibT_eX]

[DOI]

Signal Process. Image Commun., 2014

Pairwise Rotation Invariant Co-Occurrence Local Binary Pattern.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2014

Motion boundary based sampling and 3D co-occurrence descriptors for action recognition.

[BibT_eX]

[DOI]

Qiang Peng

Image Vis. Comput., 2014

Robust visual tracking based on local kernelized representation.

[BibT_eX]

[DOI]

Qiaozhe Li

Jie Yang

Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics, 2014

A Joint Evaluation of Dictionary Learning and Feature Encoding for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Saliency detection via foreground rendering and background exclusion.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Saliency driven clustering for salient object detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Saliency detection based on extended boundary prior with foci of attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Video Action Detection with Relational Dynamic-Poselets.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2014, 2014

Action Recognition with Stacked Fisher Vectors.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2014, 2014

Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2014, 2014

Action and Gesture Temporal Spotting with Super Vector Representation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2014 Workshops, 2014

Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2014, 2014

Multi-view Super Vector for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013

One-class support vector machine-assisted robust tracking.

[BibT_eX]

[DOI]

J. Electronic Imaging, 2013

Unsupervised optimal phoneme segmentation: theory and experimental evaluation.

[BibT_eX]

[DOI]

Dean Luo

IET Signal Process., 2013

A Study on Unsupervised Dictionary Learning and Feature Encoding for Action Classification.

[BibT_eX]

[DOI]

CoRR, 2013

Multi-feature canonical correlation analysis for face photo-sketch image retrieval.

[BibT_eX]

[DOI]

Proceedings of the ACM Multimedia Conference, 2013

Salient Object Segmentation Based on Automatic Labeling.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 20th International Conference, 2013

An active contour model based on multiple boundary measures.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2013

Affine SoftAssign with bidirectional distance for point matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2013

A semantic model for video based face recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Information and Automation, 2013

LTD: Local Ternary Descriptor for image matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Information and Automation, 2013

Exploring dense trajectory feature and encoding methods for human interaction recognition.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Internet Multimedia Computing and Service, 2013

Mining Motion Atoms and Phrases for Complex Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2013

Motionlets: Mid-level 3D Parts for Human Motion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Exploring Cross-Channel Texture Correlation for Color Texture Classification.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference, 2013

Multi-scale Joint Encoding of Local Binary Patterns for Texture and Material Classification.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference, 2013

Exploring Motion Boundary based Sampling and Spatial-Temporal Context Descriptors for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference, 2013

2012

Automatic music video generation: cross matching of music and image.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Cross matching of music and image.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Voice conversion using Bayesian mixture of Probabilistic Linear Regressions and dynamic kernel features.

[BibT_eX]

[DOI]

Na Li

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Bayesian Mixture of Probabilistic Linear Regressions for Voice Conversion.

[BibT_eX]

[DOI]

Na Li

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Learning geodesic CRF model for image segmentation.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on Image Processing, 2012

Person re-identification across multi-camera system based on local descriptors.

[BibT_eX]

[DOI]

Qiao Huang

Jie Yang

Proceedings of the Sixth International Conference on Distributed Smart Cameras, 2012

One-Class SVM assisted accurate tracking.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Distributed Smart Cameras, 2012

A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition.

[BibT_eX]

[DOI]

Xingxing Wang

Proceedings of the Computer Vision - ACCV 2012, 2012

2011

Regularized Maximum Likelihood Linear Regression Adaptation for Computer-Assisted Language Learning Systems.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2011

A Study on Bag of Gaussian Model with Application to Voice Conversion.

[BibT_eX]

[DOI]

Tong Tong

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Gesture Design of Hand-to-Speech Converter Derived from Speech-to-Hand Converter Based on Probabilistic Integration Model.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Knowledge-Based Segmentation of Spine and Ribs from Bone Scintigraphy.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 18th International Conference, 2011

Adaptive Region Growing Based on Boundary Measures.

[BibT_eX]

[DOI]

Jie Yang

Proceedings of the Neural Information Processing - 18th International Conference, 2011

Adaptive Detection of Hotspots in Thoracic Spine from Bone Scintigraphy.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 18th International Conference, 2011

BioSecure Signature Evaluation Campaign (ESRA'2011): evaluating systems on quality-based categories of skilled forgeries.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Joint Conference on Biometrics, 2011

Structure-constrained distribution matching using quadratic programming and its application to pronunciation evaluation.

[BibT_eX]

[DOI]

Proceedings of the First Asian Conference on Pattern Recognition, 2011

2010

A study on invariance of f-divergence and its application to speech recognition.

[BibT_eX]

[DOI]

IEEE Trans. Signal Process., 2010

Speech Structure and Its Application to Robust Speech Processing.

[BibT_eX]

[DOI]

New Gener. Comput., 2010

Face recognition based on gradient gabor feature and Efficient Kernel Fisher analysis.

[BibT_eX]

[DOI]

Baochang Zhang

Neural Comput. Appl., 2010

Dialect-based speaker classification using speaker-invariant dialect features.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Integration of multilayer regression analysis with structure-based pronunciation assessment.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Regularized-MLLR speaker adaptation for computer-assisted language learning system.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

HMM-based sequence-to-frame mapping for voice conversion.

[BibT_eX]

[DOI]

Daisuke Saito

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

A Theory of Phase Singularities for Image Representation and its Applications to Object Tracking and Image Matching.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2009

Optimal event search using a structural cost function - improvement of structure to speech conversion.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

On invariant structural representation for speech recognition: theoretical validation and experimental improvement.

[BibT_eX]

[DOI]

Keikichi Hirose

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Structural analysis of dialects, sub-dialects and sub-sub-dialects of Chinese.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Analysis and utilization of MLLR speaker adaptation technique for learners' pronunciation evaluation.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Speech generation from hand gestures based on space mapping.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Affine invariant features and their application to speech recognition.

[BibT_eX]

[DOI]

Masayuki Suzuki

Proceedings of the IEEE International Conference on Acoustics, 2009

Mixture of Probabilistic Linear Regressions: A unified view of GMM-based mapping techiques.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Free hand sketch understanding using SVMs-chain modeling for spatial and temporal patterns.

[BibT_eX]

[DOI]

Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, 2009

A study on Hidden Structural Model and its application to labeling sequences.

[BibT_eX]

[DOI]

Masayuki Suzuki

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

f-divergence is a generalized invariant measure between distributions.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Metric learning for unsupervised phoneme segmentation.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Face recognition based on Gradient Gabor feature.

[BibT_eX]

[DOI]

Baochang Zhang

Yongsheng Gao

Proceedings of the International Conference on Image Processing, 2008

Phase singularities for image representation and matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons.

[BibT_eX]

[DOI]

Naoya Shimomura

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Optimal Euler Circuit of Maximum Contiguous Cost.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2007

Offline Signature Verification Using Online Handwriting Registration.

[BibT_eX]

[DOI]

Jianzhuang Liu

Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

Random discriminant structure analysis for automatic recognition of connected vowels.

[BibT_eX]

[DOI]

Satoshi Asakawa

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

A Framework Toward Restoration of Writing Order from Single-Stroked Handwriting Image.

[BibT_eX]

[DOI]

Mikihiko Nishiara

IEEE Trans. Pattern Anal. Mach. Intell., 2006

Recover Writing Trajectory from Multiple Stroked Image Using Bidirectional Dynamic Search.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Affine Invariant Dynamic Time Warping and its Application to Online Rotated Handwriting Recognition.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Recovering Drawing Order from Offline Handwritten Image Using Direction Context and Optimal Euler Path.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

A Novel Approach to Recover Writing Order From Single Stroke Offline Handwritten Images.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), 29 August, 2005

2004

Recovering dynamic information from static handwritten images.

[BibT_eX]

[DOI]