Hongsheng Li

CoRR, May, 2026

Edit-Based Refinement for Parallel Masked Diffusion Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2026

Lumina-mGPT: Flexible Photorealistic Autoregressive Text-to-Image Generation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2026

Context Unrolling in Omni Models.

[BibT_eX]

[DOI]

CoRR, April, 2026

LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving.

[BibT_eX]

[DOI]

CoRR, April, 2026

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control.

[BibT_eX]

[DOI]

CoRR, April, 2026

ReinDriveGen: Reinforcement Post-Training for Out-of-Distribution Driving Scene Generation.

[BibT_eX]

[DOI]

CoRR, April, 2026

ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework.

[BibT_eX]

[DOI]

CoRR, March, 2026

AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization.

[BibT_eX]

[DOI]

CoRR, March, 2026

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing.

[BibT_eX]

[DOI]

CoRR, March, 2026

PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents.

[BibT_eX]

[DOI]

CoRR, March, 2026

From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors.

[BibT_eX]

[DOI]

CoRR, February, 2026

GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generatio.

[BibT_eX]

[DOI]

CoRR, February, 2026

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents.

[BibT_eX]

[DOI]

CoRR, February, 2026

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation.

[BibT_eX]

[DOI]

CoRR, February, 2026

SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics.

[BibT_eX]

[DOI]

CoRR, January, 2026

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, January, 2026

SCALAR: Spatial-concept alignment for robust vision in harsh open world.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

MADCrowner: Margin Aware Dental Crown design with template deformation and refinement.

[BibT_eX]

[DOI]

Medical Image Anal., 2026

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

TIDE: Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

EditThinker: Unlocking Iterative Reasoning for Any Image Editor.

[BibT_eX]

[DOI]

CoRR, December, 2025

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation.

[BibT_eX]

[DOI]

CoRR, December, 2025

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield.

[BibT_eX]

[DOI]

CoRR, November, 2025

Architecture Decoupling Is Not All You Need For Unified Multimodal Model.

[BibT_eX]

[DOI]

CoRR, November, 2025

Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models.

[BibT_eX]

[DOI]

CoRR, November, 2025

RelightMaster: Precise Video Relighting with Multi-plane Light Images.

[BibT_eX]

[DOI]

CoRR, November, 2025

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark.

[BibT_eX]

[DOI]

CoRR, October, 2025

PICABench: How Far Are We from Physically Realistic Image Editing?

[BibT_eX]

[DOI]

CoRR, October, 2025

SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model.

[BibT_eX]

[DOI]

CoRR, October, 2025

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images.

[BibT_eX]

[DOI]

CoRR, October, 2025

ProteinAE: Protein Diffusion Autoencoders for Structure Encoding.

[BibT_eX]

[DOI]

CoRR, October, 2025

BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception.

[BibT_eX]

[DOI]

CoRR, October, 2025

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding.

[BibT_eX]

[DOI]

CoRR, October, 2025

Factuality Matters: When Image Generation and Editing Meet Structured Visuals.

[BibT_eX]

[DOI]

CoRR, October, 2025

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing.

[BibT_eX]

[DOI]

CoRR, September, 2025

WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle.

[BibT_eX]

[DOI]

CoRR, September, 2025

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark.

[BibT_eX]

[DOI]

CoRR, September, 2025

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning.

[BibT_eX]

[DOI]

CoRR, September, 2025

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding.

[BibT_eX]

[DOI]

CoRR, September, 2025

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation.

[BibT_eX]

[DOI]

CoRR, August, 2025

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling.

[BibT_eX]

[DOI]

CoRR, July, 2025

Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation.

[BibT_eX]

[DOI]

CoRR, July, 2025

TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Big Data, June, 2025

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space.

[BibT_eX]

[DOI]

CoRR, May, 2025

Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations.

[BibT_eX]

[DOI]

CoRR, May, 2025

EnerVerse-AC: Envisioning Embodied Environments with Action Condition.

[BibT_eX]

[DOI]

CoRR, May, 2025

WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch.

[BibT_eX]

[DOI]

CoRR, May, 2025

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT.

[BibT_eX]

[DOI]

CoRR, May, 2025

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects.

[BibT_eX]

[DOI]

CoRR, April, 2025

High-Fidelity Diffusion Face Swapping with ID-Constrained Facial Conditioning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework.

[BibT_eX]

[DOI]

CoRR, March, 2025

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis.

[BibT_eX]

[DOI]

CoRR, March, 2025

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Empowering LLMs in Decision Games through Algorithmic Data Synthesis.

[BibT_eX]

[DOI]

CoRR, March, 2025

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing.

[BibT_eX]

[DOI]

CoRR, March, 2025

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation.

[BibT_eX]

[DOI]

Alexander William Bergman

CoRR, March, 2025

FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering.

[BibT_eX]

[DOI]

CoRR, February, 2025

Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT.

[BibT_eX]

[DOI]

CoRR, February, 2025

Segmentation and Vascular Vectorization for Coronary Artery by Geometry-Based Cascaded Neural Network.

[BibT_eX]

[DOI]

IEEE Trans. Medical Imaging, January, 2025

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step.

[BibT_eX]

[DOI]

CoRR, January, 2025

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking.

[BibT_eX]

[DOI]

CoRR, January, 2025

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation.

[BibT_eX]

[DOI]

CoRR, January, 2025

A3: Android Agent Arena for Mobile GUI Agents.

[BibT_eX]

[DOI]

CoRR, January, 2025

UniZero: Generalized and Efficient Planning with Scalable Latent World Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Step-Controlled DPO: Leveraging Stepwise Errors for Enhancing Mathematical Reasoning of Language Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Maize precision seeding scheme based on multi-sensor information fusion.

[BibT_eX]

[DOI]

J. Ind. Inf. Integr., 2025

MM-instruct: Generated visual instructions for large multimodal model alignment.

[BibT_eX]

[DOI]

Neurocomputing, 2025

Design and experiment of a soil organic matter sensor-based variable-rate seeding control system for maize.

[BibT_eX]

[DOI]

Comput. Electron. Agric., 2025

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

VividFace: A Robost and High-Fidelity Video Face Swapping Framework.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

VBCD: A Voxel-Based Framework for Personalized Dental Crown Design.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, 2025

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

One Leaf Reveals the Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Mixture Compressor for Mixture-of-Experts LLMs Gains More.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CameraCtrl: Enabling Camera Control for Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Lumina-Image 2.0: a Unified and Efficient Image Generative Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

GenieBlue: Integrating Both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

CameraCtrl II: Dynamic Scene Exploration via Camera-Controlled Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

PUMA: Empowering Unified MLLM with Multi-Granular Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Alignment with Fill-In-the-Middle for Enhancing Code Generation.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Let's Verify and Reinforce Image Generation Step by Step.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Docopilot: Improving Multimodal Models for Document-Level Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Probability-Consistent Preference Optimization for Enhanced LLM Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

FeatAug-DETR: Enriching One-to-Many Matching for DETRs With Feature Augmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2024

CGOF++: Controllable 3D Face Synthesis With Conditional Generative Occupancy Fields.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., February, 2024

Structured Domain Adaptation With Online Relation Regularization for Unsupervised Person Re-ID.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., January, 2024

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Pyramid Fusion Transformer for Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Enhancing Vision-Language Model with Unmasked Token Alignment.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

RNNPose: 6-DoF Object Pose Estimation via Recurrent Correspondence Field Estimation and Pose Optimization.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2024

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping.

[BibT_eX]

[DOI]

CoRR, 2024

StreamChat: Chatting with Streaming Video.

[BibT_eX]

[DOI]

CoRR, 2024

TimeWalker: Personalized Neural Space for Lifelong Head Avatars.

[BibT_eX]

[DOI]

CoRR, 2024

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective.

[BibT_eX]

[DOI]

CoRR, 2024

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation.

[BibT_eX]

[DOI]

CoRR, 2024

A foundation model for generalizable disease diagnosis in chest X-ray images.

[BibT_eX]

[DOI]

CoRR, 2024

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow.

[BibT_eX]

[DOI]

CoRR, 2024

MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More.

[BibT_eX]

[DOI]

CoRR, 2024

MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation.

[BibT_eX]

[DOI]

CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.

[BibT_eX]

[DOI]

CoRR, 2024

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents.

[BibT_eX]

[DOI]

CoRR, 2024

MAVIS: Mathematical Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

CoRR, 2024

Trim 3D Gaussian Splatting for Accurate Geometry Representation.

[BibT_eX]

[DOI]

CoRR, 2024

Phased Consistency Model.

[BibT_eX]

[DOI]

Fu-Yun Wang

Zhaoyang Huang

CoRR, 2024

TerDiT: Ternary Diffusion Models with Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior.

[BibT_eX]

[DOI]

CoRR, 2024

CameraCtrl: Enabling Camera Control for Text-to-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

ECNet: Effective Controllable Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

[BibT_eX]

[DOI]

CoRR, 2024

Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset.

[BibT_eX]

[DOI]

CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning.

[BibT_eX]

[DOI]

CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.

[BibT_eX]

[DOI]

CoRR, 2024

NODI: Out-Of-Distribution Detection with Noise from Diffusion.

[BibT_eX]

[DOI]

Jingqiu Zhou

Aojun Zhou

Alexander William Bergman

CoRR, 2024

Early prediction of maize resistance to nicosulfuron using hyperspectral imaging and deep learning: Method and mechanism.

[BibT_eX]

[DOI]

Comput. Electron. Agric., 2024

Comparative investigation and evaluation of electric-drive seed-metering systems across diverse speed ranges for enhanced high-precision seeding applications.

[BibT_eX]

[DOI]

Comput. Electron. Agric., 2024

Design and optimization of a high-speed maize seed guiding device based on DEM-CFD coupling method.

[BibT_eX]

[DOI]

Comput. Electron. Agric., 2024

AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2024 Technical Communications, 2024

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

MoVA: Adapting Mixture of Vision Experts to Multimodal Context.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Phased Consistency Models.

[BibT_eX]

[DOI]

Fu-Yun Wang

Zhaoyang Huang

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Learning 1D Causal Visual Representation with De-focus Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

VeloVox: A Low-Cost and Accurate 4D Object Detector with Single-Frame Point Cloud of Livox LiDAR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding.

[BibT_eX]

[DOI]

Benjin Zhu

Zhe Wang

Proceedings of the Computer Vision - ECCV 2024, 2024

Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediction Tasks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Be-Your-Outpainter: Mastering Video Outpainting Through Input-Specific Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

GiT: Towards Generalist Vision Transformer Through Universal Language Interface.

[BibT_eX]

[DOI]

Muhammad Ferjad Naeem

Bernt Schiele

Liwei Wang

Proceedings of the Computer Vision - ECCV 2024, 2024

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Any2Point: Empowering Any-Modality Large Models for Efficient 3D Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation Using RGB Frames and Events.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Delving Deep into Engagement Prediction of Short Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LMDrive: Closed-Loop End-to-End Driving with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GLID: Pre-training a Generalist Encoder-Decoder Vision Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

A3VLM: Actionable Articulation-Aware Vision Language Model.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

Empowering Character-level Text Infilling by Eliminating Sub-Tokens.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Teach-DETR: Better Training DETR With Teachers.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Hippocampus segmentation after brain tumor resection via postoperative region synthesis.

[BibT_eX]

[DOI]

BMC Medical Imaging, December, 2023

Predicting cancer outcomes from whole slide images via hybrid supervision learning.

[BibT_eX]

[DOI]

Neurocomputing, November, 2023

A Holistically-Guided Decoder for Deep Representation Learning With Applications to Semantic Segmentation and Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

3D Object Detection for Autonomous Driving: A Comprehensive Survey.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., August, 2023

Development and testing of a motor drive and control unit based on the field-oriented control algorithm for the seed-metering device.

[BibT_eX]

[DOI]

Comput. Electron. Agric., August, 2023

ST3D++: Denoised Self-Training for Unsupervised Domain Adaptation on 3D Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

Refined probability distribution module for fine-grained visual categorization.

[BibT_eX]

[DOI]

Neurocomputing, 2023

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2023

Ponymation: Learning 3D Animal Motions from Unlabeled Online Videos.

[BibT_eX]

[DOI]

CoRR, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.

[BibT_eX]

[DOI]

CoRR, 2023

LMDrive: Closed-Loop End-to-End Driving with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation.

[BibT_eX]

[DOI]

CoRR, 2023

ViLaM: A Vision-Language Model with Enhanced Visual Grounding and Generalization Capability.

[BibT_eX]

[DOI]

CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Learning A Multi-Task Transformer Via Unified And Customized Instruction Tuning For Chest Radiograph Interpretation.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Large-scale Masked Face Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.

[BibT_eX]

[DOI]

CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.

[BibT_eX]

[DOI]

CoRR, 2023

Meta-Transformer: A Unified Framework for Multimodal Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow.

[BibT_eX]

[DOI]

CoRR, 2023

Context-TAP: Tracking Any Point Demands Spatial Context Features.

[BibT_eX]

[DOI]

CoRR, 2023

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling.

[BibT_eX]

[DOI]

CoRR, 2023

DiffRoom: Diffusion-based High-Quality 3D Room Reconstruction and Generation with Occupancy Prior.

[BibT_eX]

[DOI]

CoRR, 2023

Voxel2Hemodynamics: An End-to-end Deep Learning Method for Predicting Coronary Artery Hemodynamics.

[BibT_eX]

[DOI]

CoRR, 2023

Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising.

[BibT_eX]

[DOI]

CoRR, 2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

Segmentation and Vascular Vectorization for Coronary Artery by Geometry-based Cascaded Neural Network.

[BibT_eX]

[DOI]

CoRR, 2023

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.

[BibT_eX]

[DOI]

CoRR, 2023

Better Aligning Text-to-Image Models with Human Preference.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

CoRR, 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.

[BibT_eX]

[DOI]

CoRR, 2023

KBNet: Kernel Basis Network for Image Restoration.

[BibT_eX]

[DOI]

CoRR, 2023

Geometry-Based End-to-End Segmentation of Coronary Artery in Computed Tomography Angiography.

[BibT_eX]

[DOI]

Proceedings of the Trustworthy Machine Learning for Healthcare, 2023

Voxel2Hemodynamics: An End-to-End Deep Learning Method for Predicting Coronary Artery Hemodynamics.

[BibT_eX]

[DOI]

Proceedings of the Statistical Atlases and Computational Models of the Heart. Regular and CMRxRecon Challenge Papers, 2023

A Unified Conditional Framework for Diffusion-based Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Context-PIPs: Persistent Independent Particles Demands Context Features.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

BlinkFlow: A Dataset to Push the Limits of Event-Based Optical Flow Estimation.

[BibT_eX]

[DOI]

IROS, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseMAE: Sparse Training Meets Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Human Preference Score: Better Aligning Text-to-image Models with Human Preference.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Urban Radiance Field Representation with Deformable Neural Mesh Primitives.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Simulating Fluids in Real-World Still Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ConQueR: Query Contrast Voxel-DETR for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ReasonNet: End-to-End Driving with Temporal and Global Reasoning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PATS: Patch Area Transportation with Subdivision for Local Feature Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

A Simple Baseline for Video Restoration with Grouped Spatial-Temporal Shift.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Adaptive Zone-aware Hierarchical Planner for Vision-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

NeuralMarker: A Framework for Learning General Marker Correspondence.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2022

Multi-Modality Self-Distillation for Weakly Supervised Temporal Action Localization.

[BibT_eX]

[DOI]

Linjiang Huang

Liang Wang

Raja Muhammad Saad Bashir

IEEE Trans. Image Process., 2022

Robust Self-Supervised LiDAR Odometry Via Representative Structure Discovery and 3D Inherent Error Modeling.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2022

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-Based Perception.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

SymReg-GAN: Symmetric Image Registration With Generative Adversarial Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

DigestPath: A benchmark dataset with challenge review for the pathological detection and segmentation of digestive-system.

[BibT_eX]

[DOI]

Ganapathy Krishnamurthi

Medical Image Anal., 2022

Efficient Burst Raw Denoising with Variance Stabilization and Multi-frequency Denoising Network.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.

[BibT_eX]

[DOI]

CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.

[BibT_eX]

[DOI]

CoRR, 2022

ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

No Attention is Needed: Grouped Spatial-temporal Shift for Simple and Efficient Video Restorers.

[BibT_eX]

[DOI]

CoRR, 2022

3D Object Detection for Autonomous Driving: A Review and New Outlooks.

[BibT_eX]

[DOI]

CoRR, 2022

MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

ConvMAE: Masked Convolution Meets Masked Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2022

Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network.

[BibT_eX]

[DOI]

CoRR, 2022

Meta Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Corn variable-rate seeding decision based on gradient boosting decision tree model.

[BibT_eX]

[DOI]

Comput. Electron. Agric., 2022

Automatic segmentation of the clinical target volume and organs at risk for rectal cancer radiotherapy using structure-contextual representations based on 3D high-resolution network.

[BibT_eX]

[DOI]

Biomed. Signal Process. Control., 2022

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MCMAE: Masked Convolution Meets Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Robust Face Recognition with Comprehensive Search.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers.

[BibT_eX]

[DOI]

Georgios Tzimiropoulos

Brais Martínez

Proceedings of the Computer Vision - ECCV 2022, 2022

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Frozen CLIP Models are Efficient Video Learners.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Learning Degradation Representations for Image Deblurring.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

FlowFormer: A Transformer Architecture for Optical Flow.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

MPPNet: Multi-frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

IDR: Self-Supervised Image Denoising via Iterative Data Refinement.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

PointCLIP: Point Cloud Understanding by CLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RBGNet: Ray-based Grouping for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation.

[BibT_eX]

[DOI]

Linjiang Huang

Liang Wang

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning a Structured Latent Space for Unsupervised Point Cloud Completion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2022

Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Task Generalizable Spatial and Texture Aware Image Downsizing Network.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2021

Person Re-Identification With Deep Kronecker-Product Matching and Group-Shuffling Random Walk.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2021

FocusNetv2: Imbalanced large and small organ segmentation with adversarial shape constraint for head and neck CT images.

[BibT_eX]

[DOI]

Medical Image Anal., 2021

Guest editorial: Deep learning for medical image analysis.

[BibT_eX]

[DOI]

Shaoting Zhang

Dimitris N. Metaxas

Neurocomputing, 2021

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.

[BibT_eX]

[DOI]

CoRR, 2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Mixed Supervision Learning for Whole Slide Image Classification.

[BibT_eX]

[DOI]

CoRR, 2021

Scalable Transformers for Neural Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2021

Container: Context Aggregation Network.

[BibT_eX]

[DOI]

CoRR, 2021

FNAS: Uncertainty-Aware Fast Neural Architecture Search.

[BibT_eX]

[DOI]

CoRR, 2021

Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification.

[BibT_eX]

[DOI]

CoRR, 2021

Decoupled Spatial-Temporal Transformer for Video Inpainting.

[BibT_eX]

[DOI]

CoRR, 2021

LIFE: Lighting Invariant Flow Estimation.

[BibT_eX]

[DOI]

CoRR, 2021

Fixing the Teacher-Student Knowledge Discrepancy in Distillation.

[BibT_eX]

[DOI]

CoRR, 2021

Consensus-Guided Correspondence Denoising.

[BibT_eX]

[DOI]

CoRR, 2021

Efficient Attention: Attention with Linear Complexities.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

DominoSearch: Find layer-wise fine-grained N: M sparse schemes from dense neural networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Container: Context Aggregation Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Categorical Relation-Preserving Contrastive Knowledge Distillation for Medical Image Classification.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Hybrid Supervision Learning for Pathology Whole Slide Image Classification.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Progressive Correspondence Pruning by Consensus Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Rethinking Noise Synthesis and Modeling in Raw Denoising.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization.

[BibT_eX]

[DOI]

Linjiang Huang

Liang Wang

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Fast Convergence of DETR with Spatially Modulated Co-Attention.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

ST3D: Self-Training for Unsupervised Domain Adaptation on 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Inverting Generative Adversarial Renderer for Face Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

VS-Net: Voting With Segmentation for Visual Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

LiDAR-Based Panoptic Segmentation via Dynamic Shifting Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Semantic Scene Completion via Integrating Instances and Scene In-the-Loop.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

REFINE: Prediction Fusion Network for Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

A Unified Multi-Scenario Attacking Network for Visual Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

HMS-Net: Hierarchical Multi-Scale Sparsity-Invariant Network for Sparse Depth Completion.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Guest Editorial: Generative Adversarial Networks for Computer Vision.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2020

Towards Overcoming False Positives in Visual Relationship Detection.

[BibT_eX]

[DOI]

CoRR, 2020

A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection.

[BibT_eX]

[DOI]

CoRR, 2020

End-to-End Object Detection with Adaptive Clustering Transformer.

[BibT_eX]

[DOI]

CoRR, 2020

PV-RCNN: The Top-Performing LiDAR-only Solutions for 3D Detection / 3D Tracking / Domain Adaptation of Waymo Open Dataset Challenges.

[BibT_eX]

[DOI]

CoRR, 2020

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2020

Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Localization: Submission to ActivityNet Challenge 2020.

[BibT_eX]

[DOI]

CoRR, 2020

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020.

[BibT_eX]

[DOI]

CoRR, 2020

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization.

[BibT_eX]

[DOI]

CoRR, 2020

Structured Domain Adaptation for Unsupervised Person Re-identification.

[BibT_eX]

[DOI]

CoRR, 2020

MagnifierNet: Towards Semantic Regularization and Fusion for Person Re-identification.

[BibT_eX]

[DOI]

CoRR, 2020

Balanced Meta-Softmax for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Multi-organ Segmentation via Co-training Weight-Averaged Models from Few-Organ Datasets.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2020, 2020

Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification.

[BibT_eX]

[DOI]

Yixiao Ge

Dapeng Chen

Proceedings of the 8th International Conference on Learning Representations, 2020

RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

EfficientFCN: Holistically-Guided Decoding for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Learning to Predict Context-Adaptive Convolution for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Self-supervising Fine-Grained Region Similarities for Large-Scale Image Localization.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Robust Superpixel-Guided Attentional Adversarial Attack.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 4th Conference on Robot Learning, 2020

MagnifierNet: Towards Semantic Adversary and Fusion for Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the 31st British Machine Vision Conference 2020, 2020

Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Deep Continuous Conditional Random Fields With Asymmetric Inter-Object Constraints for Online Multi-Object Tracking.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2019

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2019

Part-A<sup>2</sup> Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud.

[BibT_eX]

[DOI]

CoRR, 2019

A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes.

[BibT_eX]

[DOI]

Qiangfeng Cliff Zhang

CoRR, 2019

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

FocusNet: Imbalanced Large and Small Organ Segmentation with an End-to-End Deep Neural Network for Head and Neck CT Images.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, 2019

Signet Ring Cell Detection with a Semi-supervised Learning Framework.

[BibT_eX]

[DOI]

Proceedings of the Information Processing in Medical Imaging, 2019

Generalizing Monocular 3D Human Pose Estimation in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Semi-Supervised Monocular 3D Face Reconstruction With End-to-End Shape-Preserved Domain Transfer.

[BibT_eX]

[DOI]

Jingtan Piao

Chen Qian

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Interpolated Convolutional Networks for 3D Point Cloud Understanding.

[BibT_eX]

[DOI]

Jiageng Mao

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Multi-Modality Latent Interaction Network for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

P2SGrad: Refined Gradients for Optimizing Deep Face Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud.

[BibT_eX]

[DOI]

Shaoshuai Shi

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Conditional Adversarial Generative Flow for Controllable Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Group-Wise Correlation Stereo Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

A2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes.

[BibT_eX]

[DOI]

Qiangfeng Cliff Zhang

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Unsupervised Cross-Spectral Stereo Matching by Learning to Synthesize.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2018

Crafting GBD-Net for Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2018

Jointly Learning Deep Features, Deformable Parts, Occlusion and Classification for Pedestrian Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2018

Fast iteratively reweighted least squares algorithms for analysis-based sparse reconstruction.

[BibT_eX]

[DOI]

Medical Image Anal., 2018

HMS-Net: Hierarchical Multi-scale Sparsity-invariant Network for Sparse Depth Completion.

[BibT_eX]

[DOI]

CoRR, 2018

Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association.

[BibT_eX]

[DOI]

CoRR, 2018

FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Person Re-identification with Deep Similarity-Guided Graph Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Learning Monocular Depth by Distilling Cross-Domain Stereo Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Question-Guided Hybrid Convolution for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

3D Human Pose Estimation in the Wild by Adversarial Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Eliminating Background-Bias for Robust Person Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

End-to-End Deep Kronecker-Product Matching for Person Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Deep Group-Shuffling Random Walk for Person Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Single View Stereo Matching.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Video Person Re-Identification With Competitive Snippet-Similarity Aggregation and Co-Attentive Snippet Embedding.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Group Consistent Similarity Learning via Deep CRF for Person Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Co-Attending Free-Form Regions and Detections With Multi-Modal Multiplicative Feature Embedding for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Generative Adversarial Frontal View to Bird View Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Conference on 3D Vision, 2018

2017

L<sub>0</sub> Regularized Stationary-Time Estimation for Crowd Analysis.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2017

DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2017

Statistical Evaluation of No-Reference Image Quality Assessment Metrics for Remote Sensing Images.

[BibT_eX]

[DOI]

Shuang Li

Zewei Yang

ISPRS Int. J. Geo Inf., 2017

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision.

[BibT_eX]

[DOI]

CoRR, 2017

Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2017, 2017

StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks.

[BibT_eX]

[DOI]

Han Zhang

Tao Xu

Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Feature Pyramids for Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-Temporal Path Proposals.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Identity-Aware Textual-Visual Matching with Latent Co-attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Person Search with Natural Language Description.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Object Detection in Videos with Tubelet Proposal Networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Data-Driven Crowd Understanding: A Baseline for a Large-Scale Crowd Dataset.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2016

Pedestrian Behavior Modeling From Stationary Crowds With Applications to Intelligent Surveillance.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2016

Magnetic Resonance Fingerprinting with compressed sensing and distance metric learning.

[BibT_eX]

[DOI]

Neurocomputing, 2016

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks.

[BibT_eX]

[DOI]

CoRR, 2016

CRF-CNN: Modeling Structured Information in Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Crossing-Line Crowd Counting with Two-Phase Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Pedestrian Behavior Understanding and Prediction with Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Learnable Histogram: Statistical Context Features for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Object Detection from Video Tubelets with Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Structured Feature Learning for Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Silhouette Analysis for Human Action Recognition Based on Supervised Temporal t-SNE and Incremental Learning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2015

Computer-Aided Diagnosis of Mammographic Masses Using Scalable Image Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Biomed. Eng., 2015

Pedestrian Travel Time Estimation in Crowded Scenes.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Saliency detection by multi-context deep learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Cross-scene crowd counting via deep convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Understanding pedestrian behaviors from stationary crowd groups.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

DeepID-Net: Deformable deep convolutional neural networks for object detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014

Solving a Special Type of Jigsaw Puzzles: Banknote Reconstruction From a Large Number of Fragments.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2014

Feature Matching with Affine-Function Transformation Models.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2014

Silhouette analysis for human action recognition based on maximum spatio-temporal dissimilarity embedding.

[BibT_eX]

[DOI]

Jian Cheng

Haijun Liu

Mach. Vis. Appl., 2014

Landmark matching based retinal image alignment by enforcing sparsity in correspondence matrix.

[BibT_eX]

[DOI]

Medical Image Anal., 2014

SAR target recognition based on improved joint sparse representation.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2014

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection.

[BibT_eX]

[DOI]

CoRR, 2014

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification.

[BibT_eX]

[DOI]

Rui Zhao

CoRR, 2014

Fast Iteratively Reweighted Least Squares Algorithms for Analysis-Based Sparsity Reconstruction.

[BibT_eX]

[DOI]

CoRR, 2014

Preconditioning for Accelerated Iteratively Reweighted Least Squares in Structured Sparsity Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013

Object Matching Using a Locally Affine Invariant and Linear Programming Techniques.

[BibT_eX]

[DOI]

Lei He

IEEE Trans. Pattern Anal. Mach. Intell., 2013

2012

Automatic Image Annotation and Retrieval Using Group Sparsity.

[BibT_eX]

[DOI]

IEEE Trans. Syst. Man Cybern. Part B, 2012

A hierarchical image clustering cosegmentation framework.

[BibT_eX]

[DOI]

Edward Kim

Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011

Active Volume Models for Medical Image Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Medical Imaging, 2011

Approximately Global Optimization for Robust Alignment of Generalized Shapes.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2011

Composite splitting algorithms for convex optimization.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2011

Extraction and analysis of actin networks based on Open Active Contour models.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2011

Actin Filament Segmentation Using Dynamic Programming.

[BibT_eX]

[DOI]

Proceedings of the Information Processing in Medical Imaging, 2011

A 3D Laplacian-driven parametric deformable model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2011

Optimal object matching via convexification and composition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2011

2010

Actin Filament Segmentation Using Spatiotemporal Active-Surface and Active-Contour Models.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer-Assisted Intervention, 2010

Automatic image annotation using group sparsity.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

Object matching with a locally affine-invariant constraint.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009

Actin Filament Tracking Based on Particle Filters and Stretching Open Active Contour Models.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer-Assisted Intervention, 2009

Automated Actin Filament Segmentation, Tracking and TIP Elongation Measurements Based on Open Active Contour Models.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, June 28, 2009

Active volume models for 3D medical image segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Global optimization for alignment of generalized shapes.

[BibT_eX]

[DOI]