Ping Luo

Orcid: 0000-0002-6685-7950

Affiliations:

University of Hong Kong, Shanghai AI Laboratory, Department of Computer Science, Hong Kong
Chinese University of Hong Kong, Department of Information Engineering, Hong Kong (PhD 2014)
Sun Yat-Sen University, School of Software, Guangzhou, China (former)
Lotus Hill Insititue, China (former)

According to our database¹, Ping Luo authored at least 484 papers between 2009 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

GaussianDream: A Feed-Forward 3D Gaussian World Model for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, May, 2026

Invert Your Prompt: Editing-Aware Diffusion Inversion.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2026

A Systematic Post-Train Framework for Video Generation.

[BibT_eX]

[DOI]

CoRR, April, 2026

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, April, 2026

MM-Hand: A 21-DOF Multi-modal Modular Dexterous Robotic Hand with Remote Actuation.

[BibT_eX]

[DOI]

CoRR, April, 2026

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System.

[BibT_eX]

[DOI]

CoRR, April, 2026

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling.

[BibT_eX]

[DOI]

CoRR, April, 2026

Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM.

[BibT_eX]

[DOI]

CoRR, April, 2026

SMASH: Mastering Scalable Whole-Body Skills for Humanoid Ping-Pong with Egocentric Vision.

[BibT_eX]

[DOI]

CoRR, April, 2026

ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K.

[BibT_eX]

[DOI]

CoRR, March, 2026

ReconDrive: Fast Feed-Forward 4D Gaussian Splatting for Autonomous Driving Scene Reconstruction.

[BibT_eX]

[DOI]

CoRR, March, 2026

VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory.

[BibT_eX]

[DOI]

CoRR, March, 2026

RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design.

[BibT_eX]

[DOI]

CoRR, March, 2026

Video Understanding With Large Language Models: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., February, 2026

AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors.

[BibT_eX]

[DOI]

CoRR, February, 2026

RISE: Self-Improving Robot Policy with Compositional World Model.

[BibT_eX]

[DOI]

CoRR, February, 2026

EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration.

[BibT_eX]

[DOI]

CoRR, February, 2026

UniVTAC: A Unified Simulation Platform for Visuo-Tactile Manipulation Data Generation, Learning, and Benchmarking.

[BibT_eX]

[DOI]

CoRR, February, 2026

χ0: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies.

[BibT_eX]

[DOI]

CoRR, February, 2026

HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control.

[BibT_eX]

[DOI]

CoRR, February, 2026

LINA: Linear Autoregressive Image Generative Models with Continuous Tokens.

[BibT_eX]

[DOI]

CoRR, January, 2026

Advances and Innovations in the Multi-Agent Robotic System (MARS) Challenge.

[BibT_eX]

[DOI]

CoRR, January, 2026

Performance-guided Reinforced Active Learning for Object Detection.

[BibT_eX]

[DOI]

CoRR, January, 2026

TVWorld: Foundations for Remote-Control TV Agents.

[BibT_eX]

[DOI]

CoRR, January, 2026

Rotate Your Character: Revisiting Video Diffusion Models for High-Quality 3D Character Generation.

[BibT_eX]

[DOI]

CoRR, January, 2026

Is Diversity All You Need for Scalable Robotic Manipulation?

[BibT_eX]

[DOI]

IEEE Trans. Robotics, 2026

MM-Eureka: Toward Stable Multimodal Reasoning via Rule-based Reinforcement Learning with Policy Drift Control.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

Learning Coherent Portrait-to-Anime Translation via Latent Cyclic Transformation.

[BibT_eX]

[DOI]

Comput. Vis. Media, 2026

Machine learning-aided optimal design and distributed model predictive control of reactive dividing wall column.

[BibT_eX]

[DOI]

Comput. Chem. Eng., 2026

Beyond Query Memorization: Large Language Model Routing with Query Decomposition and Historical Matching.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing.

[BibT_eX]

[DOI]

CoRR, December, 2025

ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models.

[BibT_eX]

[DOI]

CoRR, December, 2025

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models.

[BibT_eX]

[DOI]

Juan-Manuel Pérez-Rúa

CoRR, December, 2025

MM-ACT: Learn from Multimodal Parallel Generation to Act.

[BibT_eX]

[DOI]

CoRR, December, 2025

SPOT: Scalable 3D Pre-Training via Occupancy Prediction for Learning Transferable 3D Representations.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook.

[BibT_eX]

[DOI]

ACM Comput. Surv., November, 2025

Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data.

[BibT_eX]

[DOI]

CoRR, November, 2025

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats.

[BibT_eX]

[DOI]

CoRR, October, 2025

FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model.

[BibT_eX]

[DOI]

CoRR, October, 2025

Humanoid Goalkeeper: Learning from Position Conditioned Task-Motion Constraints.

[BibT_eX]

[DOI]

CoRR, October, 2025

Object-AVEdit: An Object-level Audio-Visual Editing Model.

[BibT_eX]

[DOI]

CoRR, October, 2025

Tooth Motion Monitoring in Orthodontic Treatment by Mobile Device-Based Multi-View Stereo.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., September, 2025

Fast-dLLM v2: Efficient Block-Diffusion LLM.

[BibT_eX]

[DOI]

CoRR, September, 2025

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, September, 2025

DriveE2E: Closed-Loop Benchmark for End-to-End Autonomous Driving through Real-to-Simulation.

[BibT_eX]

[DOI]

CoRR, September, 2025

A Generative Foundation Model for Chest Radiography.

[BibT_eX]

[DOI]

CoRR, September, 2025

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies.

[BibT_eX]

[DOI]

CoRR, August, 2025

SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control.

[BibT_eX]

[DOI]

CoRR, August, 2025

LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation.

[BibT_eX]

[DOI]

CoRR, August, 2025

WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception.

[BibT_eX]

[DOI]

CoRR, August, 2025

AnalogCoder-Pro: Unifying Analog Circuit Generation and Optimization via Multi-modal LLMs.

[BibT_eX]

[DOI]

CoRR, August, 2025

NeRFBuff: Fast Neural Rendering via Inter-Frame Feature Buffering.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., July, 2025

Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition.

[BibT_eX]

[DOI]

CoRR, July, 2025

TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Big Data, June, 2025

RIGID: Recurrent GAN Inversion and Editing of Real Face Videos and Beyond.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., June, 2025

Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop.

[BibT_eX]

[DOI]

CoRR, June, 2025

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Aligning Latent Spaces with Flow Priors.

[BibT_eX]

[DOI]

CoRR, June, 2025

Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding.

[BibT_eX]

[DOI]

CoRR, May, 2025

PhyX: Does Your Model Have the "Wits" for Physical Reasoning?

[BibT_eX]

[DOI]

CoRR, May, 2025

Scaling Law for Quantization-Aware Training.

[BibT_eX]

[DOI]

CoRR, May, 2025

AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory.

[BibT_eX]

[DOI]

CoRR, May, 2025

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision.

[BibT_eX]

[DOI]

CoRR, May, 2025

DanceGRPO: Unleashing GRPO on Visual Generation.

[BibT_eX]

[DOI]

CoRR, May, 2025

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions.

[BibT_eX]

[DOI]

CoRR, May, 2025

StyleAdapter: A Unified Stylized Image Generation Model.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2025

PixelFlow: Pixel-Space Generative Models with Flow.

[BibT_eX]

[DOI]

CoRR, April, 2025

LVLM-EHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

Centaur: Robust End-to-End Autonomous Driving with Test-Time Training.

[BibT_eX]

[DOI]

CoRR, March, 2025

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems.

[BibT_eX]

[DOI]

AgiBot-World-Contributors

CoRR, March, 2025

VB-Com: Learning Vision-Blind Composite Humanoid Locomotion Against Deficient Perception.

[BibT_eX]

[DOI]

CoRR, February, 2025

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation.

[BibT_eX]

[DOI]

CoRR, February, 2025

LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation.

[BibT_eX]

[DOI]

CoRR, January, 2025

Autoregressive Models in Vision: A Survey.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

B-AVIBench: Toward Evaluating the Robustness of Large Vision-Language Model on Black-Box Adversarial Visual-Instructions.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2025

Adaptive Superpixel-Guided Non-Homogeneous Image Dehazing.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

Sem-iNeRF: Camera Pose Refinement by Inverting Neural Radiance Fields with Semantic Feature Consistency.

[BibT_eX]

[DOI]

Comput. Vis. Media, 2025

VideoChat: chat-centric video understanding.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2025

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

DiffusionMat: Alpha Matting as Deterministic Sequential Refinement Learning.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

Learning Humanoid Locomotion with Perceptive Internal Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

BOOD: Boundary-based Out-Of-Distribution Data Generation.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-Time Open-Vocabulary Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Prompt-A-Video: Prompt your Video Diffusion Model via Preference-Aligned LLM.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

NADER: Neural Architecture Design via Multi-Agent Collaboration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Goku: Flow Based Video Generative Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Text2World: Benchmarking Large Language Models for Symbolic World Model Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

End-to-End Autonomous Driving Through V2X Cooperation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

AnalogCoder: Analog Circuit Design via Training-Free Code Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

TCFormer: Visual Recognition via Token Clustering Transformer.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., December, 2024

Enhance Sample Efficiency and Robustness of End-to-End Urban Autonomous Driving via Semantic Masked World Model.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Transp. Syst., October, 2024

Prototypical Context-Aware Dynamics for Generalization in Visual Control With Model-Based Reinforcement Learning.

[BibT_eX]

[DOI]

IEEE Trans. Ind. Informatics, September, 2024

End-to-End Video Text Spotting with Transformer.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., September, 2024

Deeply Unsupervised Patch Re-Identification for Pre-Training Object Detectors.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2024

FAT: Frequency-Aware Transformation for Bridging Full-Precision and Low-Precision Deep Representations.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., February, 2024

Context Autoencoder for Self-supervised Representation Learning.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., January, 2024

Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

Rethinking Attentive Object Detection via Neural Attention Learning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning.

[BibT_eX]

[DOI]

CoRR, 2024

TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception.

[BibT_eX]

[DOI]

CoRR, 2024

DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping.

[BibT_eX]

[DOI]

CoRR, 2024

DCP: Learning Accelerator Dataflow for Neural Network via Propagation.

[BibT_eX]

[DOI]

CoRR, 2024

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

HRVMamba: High-Resolution Visual State Space Model for Dense Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking.

[BibT_eX]

[DOI]

CoRR, 2024

Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing.

[BibT_eX]

[DOI]

CoRR, 2024

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2024

TCFormer: Visual Recognition via Token Clustering Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models.

[BibT_eX]

[DOI]

CoRR, 2024

DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning.

[BibT_eX]

[DOI]

CoRR, 2024

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices.

[BibT_eX]

[DOI]

CoRR, 2024

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge.

[BibT_eX]

[DOI]

CoRR, 2024

Adapting LLaMA Decoder to Vision Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

FlashFace: Human Image Personalization with High-fidelity Identity Preservation.

[BibT_eX]

[DOI]

CoRR, 2024

DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving.

[BibT_eX]

[DOI]

CoRR, 2024

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Implicit Prompt For Text-To-Image Models.

[BibT_eX]

[DOI]

CoRR, 2024

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation.

[BibT_eX]

[DOI]

CoRR, 2024

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models.

[BibT_eX]

[DOI]

CoRR, 2024

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

Part123: Part-aware 3D Reconstruction from a Single-view Image.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

Learning Manipulation by Predicting Interaction.

[BibT_eX]

[DOI]

Proceedings of the Robotics: Science and Systems XX, 2024

MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Needle In A Multimodal Haystack.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Position: Towards Implicit Prompt For Text-To-Image Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

PROGRAM: PROtotype GRAph Model based Pseudo-Label Learning for Test-Time Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

VDT: General-purpose Video Diffusion Transformers via Mask Modeling.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Large Language Models as Automated Aligners for benchmarking Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

When Pedestrian Detection Meets Multi-modal Learning: Generalist Model and Benchmark Dataset.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

GKGNet: Group K-Nearest Neighbor Based Graph Convolutional Network for Multi-label Image Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

DriveLM: Driving with Graph Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (Early Version).

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

UniFS: Universal Few-Shot Instance Perception with Point Representations.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-person Multi-task Human-Centric Perception.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

PIXART-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Generalized Predictive Model for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

RegionGPT: Towards Region Understanding Vision Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GenTron: Diffusion Transformers for Image and Video Generation.

[BibT_eX]

[DOI]

Juan-Manuel Pérez-Rúa

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

KET-QA: A Dataset for Knowledge Enhanced Table Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

LLaMA Pro: Progressive LLaMA with Block Expansion.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Cached Transformers: Improving Transformers with Differentiable Memory Cachde.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

RestoreFormer++: Towards Real-World Blind Face Restoration From Undegraded Key-Value Pairs.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Sparse R-CNN: An End-to-End Framework for Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

CycleMLP: A MLP-Like Architecture for Dense Visual Predictions.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

ZoomNAS: Searching for Whole-Body Human Pose Estimation in the Wild.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2023

RelativeNAS: Relative Neural Architecture Search via Slow-Fast Learning.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., 2023

Understanding Self-Supervised Pretraining with Part-Aware Representation Learning.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

MGL: Mutual Graph Learning for Camouflaged Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

Video Understanding with Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces.

[BibT_eX]

[DOI]

CoRR, 2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

DriveLM: Driving with Graph Visual Question Answering.

[BibT_eX]

[DOI]

CoRR, 2023

Cached Transformers: Improving Transformers with Differentiable Memory Cache.

[BibT_eX]

[DOI]

CoRR, 2023

A Survey of Reasoning with Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2023

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation.

[BibT_eX]

[DOI]

Juan-Manuel Pérez-Rúa

CoRR, 2023

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation.

[BibT_eX]

[DOI]

CoRR, 2023

MLLMs-Augmented Visual-Language Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Advancing Vision Transformers with Group-Mix Attention.

[BibT_eX]

[DOI]

CoRR, 2023

Large Language Models as Automated Aligners for benchmarking Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

DiffusionMat: Alpha Matting as Sequential Refinement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.

[BibT_eX]

[DOI]

CoRR, 2023

MeanAP-Guided Reinforced Active Learning for Object Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face.

[BibT_eX]

[DOI]

CoRR, 2023

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.

[BibT_eX]

[DOI]

CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest.

[BibT_eX]

[DOI]

CoRR, 2023

Align, Adapt and Inject: Sound-guided Unified Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

SyNDock: N Rigid Protein Docking via Learnable Group Synchronization.

[BibT_eX]

[DOI]

CoRR, 2023

VDT: An Empirical Study on Video Diffusion with Transformers.

[BibT_eX]

[DOI]

CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans.

[BibT_eX]

[DOI]

CoRR, 2023

Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

EC^2: Emergent Communication for Embodied Control.

[BibT_eX]

[DOI]

CoRR, 2023

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer.

[BibT_eX]

[DOI]

CoRR, 2023

Topology Reasoning for Driving Scenes.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-Level Contrastive Learning for Dense Prediction Task.

[BibT_eX]

[DOI]

CoRR, 2023

Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction.

[BibT_eX]

[DOI]

CoRR, 2023

Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos.

[BibT_eX]

[DOI]

CoRR, 2023

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception.

[BibT_eX]

[DOI]

CoRR, 2023

Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Foundation Model is Efficient Multimodal Multitask Model Selector.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Neural MPC-Based Decision-Making Framework for Autonomous Driving in Multi-Lane Roundabout.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on Intelligent Transportation Systems, 2023

π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

ChiPFormer: Transferable Chip Placement via Offline Decision Transformer.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Learning Object-Language Alignments for Open-Vocabulary Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.

[BibT_eX]

[DOI]

Kannappan Palaniappan

Norbert Scherer-Negenborn

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

RIGID: Recurrent GAN Inversion and Editing of Real Face Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Transformers for Open-world Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Segment Every Reference Object in Spatial and Temporal Spaces.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scene as Occupancy.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Going Denser with Open-Vocabulary Part Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DDP: Diffusion Model for Dense Visual Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Beyond One-to-One: Rethinking the Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EGC: Image Generation and Classification via a Diffusion Energy-Based Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffusionDet: Diffusion Model for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Dense Distinct Query for End-to-End Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Real-Time Controllable Denoising for Image and Video.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Accelerating Vision-Language Pretraining with Free Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EC2: Emergent Communication for Embodied Control.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Policy Adaptation from Foundation Model Feedback.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Universal Instance Perception as Object Discovery and Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Structured Pruning for Efficient Generative Pre-trained Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

DrugOOD: Out-of-Distribution Dataset Curator and Benchmark for AI-Aided Drug Discovery - a Focus on Affinity Prediction Problems with Noise Annotations.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

MetaCloth: Learning Unseen Tasks of Dense Fashion Landmark Detection From a Few Samples.

[BibT_eX]

[DOI]

Yuying Ge

Ruimao Zhang

Ping Luo

IEEE Trans. Image Process., 2022

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

PVT v2: Improved baselines with Pyramid Vision Transformer.

[BibT_eX]

[DOI]

Comput. Vis. Media, 2022

Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2022

Prototypical context-aware dynamics generalization for high-dimensional model-based reinforcement learning.

[BibT_eX]

[DOI]

CoRR, 2022

Large-batch Optimization for Dense Visual Predictions.

[BibT_eX]

[DOI]

CoRR, 2022

Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model.

[BibT_eX]

[DOI]

CoRR, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.

[BibT_eX]

[DOI]

CoRR, 2022

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild.

[BibT_eX]

[DOI]

CoRR, 2022

Pose for Everything: Towards Category-Agnostic Pose Estimation.

[BibT_eX]

[DOI]

CoRR, 2022

Exploiting Context Information for Generic Event Boundary Captioning.

[BibT_eX]

[DOI]

CoRR, 2022

CO^3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval.

[BibT_eX]

[DOI]

CoRR, 2022

Semantic-Aware Pretraining for Dense Video Captioning.

[BibT_eX]

[DOI]

CoRR, 2022

M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation.

[BibT_eX]

[DOI]

CoRR, 2022

WegFormer: Transformers for Weakly Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2022

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery - A Focus on Affinity Prediction Problems with Noise Annotations.

[BibT_eX]

[DOI]

CoRR, 2022

MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning.

[BibT_eX]

[DOI]

CoRR, 2022

BridgeFormer: Bridging Video-text Retrieval with Multiple Choice Questions.

[BibT_eX]

[DOI]

CoRR, 2022

Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Rethinking Resolution in the Context of Efficient Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning.

[BibT_eX]

[DOI]

Yao Lai

Yao Mu

Ping Luo

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Flow-based Recurrent Belief State Learning for POMDPs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Objects in Semantic Topology.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Dynamic Token Normalization improves Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Learning Versatile Neural Architectures by Propagating Network Codes.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

CycleMLP: A MLP-like Architecture for Dense Prediction.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Polygon-Free: Unconstrained Scene Text Detection with Box Annotations.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

ByteTrack: Multi-object Tracking by Associating Every Detection Box.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Grand Unification of Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Pose for Everything: Towards Category-Agnostic Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PoseTrans: A Simple yet Effective Pose Transformation Augmentation for Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

DaViT: Dual Attention Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2022

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Language as Queries for Referring Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scale-Equivalent Distillation for Semi-Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Bridging Video-text Retrieval with Multiple Choice Questions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2022

Compression of Generative Pre-trained Language Models via Quantization.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Switchable Normalization for Learning-to-Normalize Deep Representation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2021

Dynamic Token Normalization Improves Vision Transformer.

[BibT_eX]

[DOI]

CoRR, 2021

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation.

[BibT_eX]

[DOI]

CoRR, 2021

ByteTrack: Multi-Object Tracking by Associating Every Detection Box.

[BibT_eX]

[DOI]

CoRR, 2021

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Towards High-Quality Temporal Action Detection with Sparse Proposals.

[BibT_eX]

[DOI]

CoRR, 2021

Panoptic SegFormer.

[BibT_eX]

[DOI]

CoRR, 2021

PVTv2: Improved Baselines with Pyramid Vision Transformer.

[BibT_eX]

[DOI]

CoRR, 2021

BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening.

[BibT_eX]

[DOI]

CoRR, 2021

Unsupervised Pretraining for Object Detection by Patch Reidentification.

[BibT_eX]

[DOI]

CoRR, 2021

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation.

[BibT_eX]

[DOI]

CoRR, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Trans2Seg: Transparent Object Segmentation with Transformer.

[BibT_eX]

[DOI]

CoRR, 2021

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Model-Based Reinforcement Learning via Imagination with Derived Memory.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Compressed Video Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Rethinking the Pruning Criteria for Convolutional Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Multi-compound Transformer for Accurate Biomedical Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Segmenting Transparent Objects in the Wild with Transformer.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

What Makes for End-to-End Object Detection?

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

End-to-End Dense Video Captioning with Parallel Decoding.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Bringing Events into Video Deblurring with Non-consecutively Blurry Frames.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Watch Only Once: An End-to-End Video Action Detection Framework.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Adversarial Robustness for Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Sparse R-CNN: End-to-End Object Detection With Learnable Proposals.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Parser-Free Virtual Try-On via Distilling Appearance Flows.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

A Unified Multi-Scenario Attacking Network for Visual Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

SSN: Learning Sparse Switchable Normalization via SparsestMax.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2020

TransTrack: Multiple-Object Tracking with Transformer.

[BibT_eX]

[DOI]

CoRR, 2020

OneNet: Towards End-to-End One-Stage Object Detection.

[BibT_eX]

[DOI]

CoRR, 2020

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training.

[BibT_eX]

[DOI]

CoRR, 2020

Convolution-Weight-Distribution Assumption: Rethinking the Criteria of Channel Pruning.

[BibT_eX]

[DOI]

Zhongzhan Huang

Xinjiang Wang

Ping Luo

CoRR, 2020

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory.

[BibT_eX]

[DOI]

CoRR, 2020

Domain-Adaptive Few-Shot Learning.

[BibT_eX]

[DOI]

CoRR, 2020

How Does BN Increase Collapsed Neural Network Filters?

[BibT_eX]

[DOI]

CoRR, 2020

UXNet: Searching Multi-level Feature Aggregation for 3D Medical Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2020, 2020

Channel Equilibrium Networks for Learning Deep Representation.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Webly Supervised Image Classification with Self-contained Confidence.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Segmenting Transparent Objects in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Dynamic and Static Context-Aware LSTM for Multi-agent Motion Prediction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Whole-Body Human Pose Estimation in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Differentiable Hierarchical Graph Grouping for Multi-person Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Exemplar Normalization for Learning Deep Representation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

3D Human Mesh Regression With Dense Correspondence.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

PolarMask: Single Shot Instance Segmentation With Polar Representation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning a Reinforced Agent for Flexible Exposure Bracketing Selection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Online Knowledge Distillation via Collaborative Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Depth-Guided Convolutions for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Human Centric Visual Analysis with Deep Learning

[BibT_eX]

[DOI]

Springer, ISBN: 978-981-13-2386-7, 2020

2019

SCAN: Self-and-Collaborative Attention Network for Video Person Re-Identification.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

TextSR: Content-Aware Text Super-Resolution Guided by Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

Towards Improving Generalization of Deep Networks via Consistent Normalization.

[BibT_eX]

[DOI]

CoRR, 2019

WIDER Face and Pedestrian Challenge 2018: Methods and Results.

[BibT_eX]

[DOI]

CoRR, 2019

DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images.

[BibT_eX]

[DOI]

CoRR, 2019

Differentiable Dynamic Normalization for Learning Deep Representation.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Towards Understanding Regularization in Batch Normalization.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Differentiable Learning-to-Normalize via Switchable Normalization.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Vision-Infused Deep Audio Inpainting.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Switchable Whitening for Deep Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Deep Self-Learning From Noisy Labels.

[BibT_eX]

[DOI]

Jiangfan Han

Ping Luo

Xiaogang Wang

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

SSN: Learning Sparse Switchable Normalization via SparsestMax.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Faceness-Net: Face Detection through Deep Facial Part Responses.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2018

Deep Learning Markov Random Field for Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2018

From Facial Expression Recognition to Interpersonal Relation Prediction.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2018

FaceFeat-GAN: a Two-Stage Approach for Identity-Preserving Face Synthesis.

[BibT_eX]

[DOI]

CoRR, 2018

Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?

[BibT_eX]

[DOI]

CoRR, 2018

Differentiable Learning-to-Normalize via Switchable Normalization.

[BibT_eX]

[DOI]

Ping Luo

Jiamin Ren

Zhanglin Peng

CoRR, 2018

Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches.

[BibT_eX]

[DOI]

CoRR, 2018

Kalman Normalization: Normalizing Internal Representations Across Network Layers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

CUImage: A Neverending Learning Platform on a Convolutional Knowledge Graph of Billion Web Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

Scheduling Large-scale Distributed Training via Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Spatial as Deep: Spatial CNN for Traffic Scene Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2017

Video Object Segmentation with Re-identification.

[BibT_eX]

[DOI]

CoRR, 2017

Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

EigenNet: Towards Fast and Structural Learning of Deep Neural Networks.

[BibT_eX]

[DOI]

Ping Luo

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Learning Deep Architectures via Generalized Whitened Neural Networks.

[BibT_eX]

[DOI]

Ping Luo

Proceedings of the 34th International Conference on Machine Learning, 2017

Deep Dual Learning for Semantic Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Object Interactions and Descriptions for Semantic Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Learning Compositional Shape Models of Multiple Distance Metrics by Information Projection.

[BibT_eX]

[DOI]

Ping Luo

Liang Lin

Xiaobai Liu

IEEE Trans. Neural Networks Learn. Syst., 2016

Clothes Co-Parsing Via Joint Image Segmentation and Labeling With Application to Clothing Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2016

Learning Deep Representation for Face Alignment with Auxiliary Attributes.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2016

Joint Face Representation Adaptation and Clustering in Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Fashion Landmark Detection in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

WIDER FACE: A Face Detection Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Face Model Compression by Distilling Knowledge from Neurons.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

Learning to Recognize Pedestrian Attribute.

[BibT_eX]

[DOI]

CoRR, 2015

Learning Social Relation Traits from Face Images.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

From Facial Parts Responses to Face Detection: A Deep Learning Approach.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Deep Learning Strong Parts for Pedestrian Detection.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Deep Learning Face Attributes in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Semantic Image Segmentation via Deep Parsing Network.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

A large-scale car dataset for fine-grained categorization and verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Pedestrian detection aided by deep learning semantic tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

DeepID-Net: Deformable deep convolutional neural networks for object detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Deep Representation Learning with Target Coding.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014

Deep learning for attribute inference, parsing, and recognition of face.

[BibT_eX]

[DOI]

Ping Luo

PhD thesis, 2014

Deep Learning Multi-View Representation for Face Recognition.

[BibT_eX]

[DOI]

CoRR, 2014

Recover Canonical-View Faces in the Wild with Deep Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2014

Learning and Transferring Multi-task Deep Representation for Face Alignment.

[BibT_eX]

[DOI]

CoRR, 2014

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection.

[BibT_eX]

[DOI]

CoRR, 2014

Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Pedestrian Attribute Recognition At Far Distance.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Facial Landmark Detection by Deep Multi-task Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2014, 2014

Clothing Co-parsing by Joint Image Segmentation and Labeling.

[BibT_eX]

[DOI]

Wei Yang

Ping Luo

Liang Lin

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Switchable Deep Network for Pedestrian Detection.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013

Deep Learning Identity-Preserving Face Space.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2013

A Deep Sum-Product Architecture for Robust Facial Attributes Analysis.

[BibT_eX]

[DOI]

Ping Luo

Xiaogang Wang

Xiaoou Tang

Proceedings of the IEEE International Conference on Computer Vision, 2013

Pedestrian Parsing via Deep Decompositional Network.

[BibT_eX]

[DOI]

Ping Luo

Xiaogang Wang

Xiaoou Tang

Proceedings of the IEEE International Conference on Computer Vision, 2013

2012

Representing and recognizing objects with massive local image patches.

[BibT_eX]

[DOI]

Pattern Recognit., 2012

Joint semantic segmentation by searching for compatible-competitive references.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Hierarchical face parsing via deep learning.

[BibT_eX]

[DOI]

Ping Luo

Xiaogang Wang

Xiaoou Tang

Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2010

A Discriminative Model for Object Representation and Detection via Sparse Features.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Pattern Recognition, 2010

Semantics-driven portrait cartoon stylization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Image Processing, 2010

Learning Shape Detector by Quantizing Curve Segments with Multiple Distance Metrics.

[BibT_eX]

[DOI]

Ping Luo

Liang Lin

Hongyang Chao

Proceedings of the Computer Vision, 2010

2009

Hierarchical 3D perception from a single image.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Image Processing, 2009

Ping Luo

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...