Ping Luo

Orcid: 0000-0002-6685-7950

Affiliations:
  • University of Hong Kong, Department of Computer Science, Hong Kong
  • Chinese University of Hong Kong, Department of Information Engineering, Hong Kong (PhD 2014)
  • Sun Yat-Sen University, School of Software, Guangzhou, China (former)
  • Lotus Hill Insititue, China (former)


According to our database1, Ping Luo authored at least 338 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
End-to-End Video Text Spotting with Transformer.
Int. J. Comput. Vis., September, 2024

Deeply Unsupervised Patch Re-Identification for Pre-Training Object Detectors.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2024

FAT: Frequency-Aware Transformation for Bridging Full-Precision and Low-Precision Deep Representations.
IEEE Trans. Neural Networks Learn. Syst., February, 2024

Context Autoencoder for Self-supervised Representation Learning.
Int. J. Comput. Vis., January, 2024

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
CoRR, 2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model.
CoRR, 2024

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies.
CoRR, 2024

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts.
CoRR, 2024

TCFormer: Visual Recognition via Token Clustering Transformer.
CoRR, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models.
CoRR, 2024

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset.
CoRR, 2024

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model.
CoRR, 2024

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models.
CoRR, 2024

DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning.
CoRR, 2024

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality.
CoRR, 2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks.
CoRR, 2024

Needle In A Multimodal Haystack.
CoRR, 2024

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation.
CoRR, 2024

Learning Manipulation by Predicting Interaction.
CoRR, 2024

AnalogCoder: Analog Circuit Design via Training-Free Code Generation.
CoRR, 2024

UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge.
CoRR, 2024

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots.
CoRR, 2024

Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs.
CoRR, 2024

UniFS: Universal Few-shot Instance Perception with Point Representations.
CoRR, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
CoRR, 2024

Adapting LLaMA Decoder to Vision Transformer.
CoRR, 2024

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model.
CoRR, 2024

End-to-End Autonomous Driving through V2X Cooperation.
CoRR, 2024

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models.
CoRR, 2024

FlashFace: Human Image Personalization with High-fidelity Identity Preservation.
CoRR, 2024

DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving.
CoRR, 2024

Generalized Predictive Model for Autonomous Driving.
CoRR, 2024

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions.
CoRR, 2024

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation.
CoRR, 2024

Towards Implicit Prompt For Text-To-Image Models.
CoRR, 2024

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks.
CoRR, 2024

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation.
CoRR, 2024

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM.
CoRR, 2024

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models.
CoRR, 2024

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
CoRR, 2024

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

Part123: Part-aware 3D Reconstruction from a Single-view Image.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Position: Towards Implicit Prompt For Text-To-Image Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

PROGRAM: PROtotype GRAph Model based Pseudo-Label Learning for Test-Time Adaptation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

VDT: General-purpose Video Diffusion Transformers via Mask Modeling.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Large Language Models as Automated Aligners for benchmarking Vision-Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization.
Proceedings of the IEEE International Conference on Acoustics, 2024

KET-QA: A Dataset for Knowledge Enhanced Table Question Answering.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
RestoreFormer++: Towards Real-World Blind Face Restoration From Undegraded Key-Value Pairs.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Sparse R-CNN: An End-to-End Framework for Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

CycleMLP: A MLP-Like Architecture for Dense Visual Predictions.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

ZoomNAS: Searching for Whole-Body Human Pose Estimation in the Wild.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2023

RelativeNAS: Relative Neural Architecture Search via Slow-Fast Learning.
IEEE Trans. Neural Networks Learn. Syst., 2023

Understanding Self-Supervised Pretraining with Part-Aware Representation Learning.
Trans. Mach. Learn. Res., 2023

MGL: Mutual Graph Learning for Camouflaged Object Detection.
IEEE Trans. Image Process., 2023

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces.
CoRR, 2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
CoRR, 2023

DriveLM: Driving with Graph Visual Question Answering.
CoRR, 2023

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution.
CoRR, 2023

A Survey of Reasoning with Foundation Models.
CoRR, 2023

You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception.
CoRR, 2023

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation.
CoRR, 2023

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation.
CoRR, 2023

MLLMs-Augmented Visual-Language Representation Learning.
CoRR, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
CoRR, 2023

Large Language Models as Automated Aligners for benchmarking Vision-Language Models.
CoRR, 2023

DiffusionMat: Alpha Matting as Sequential Refinement Learning.
CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.
CoRR, 2023

MeanAP-Guided Reinforced Active Learning for Object Detection.
CoRR, 2023

Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face.
CoRR, 2023

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving.
CoRR, 2023

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis.
CoRR, 2023

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving.
CoRR, 2023

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation.
CoRR, 2023

GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition.
CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.
CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
CoRR, 2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest.
CoRR, 2023

Align, Adapt and Inject: Sound-guided Unified Image Generation.
CoRR, 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.
CoRR, 2023

RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths.
CoRR, 2023

SyNDock: N Rigid Protein Docking via Learnable Group Synchronization.
CoRR, 2023

VDT: An Empirical Study on Video Diffusion with Transformers.
CoRR, 2023

VideoChat: Chat-Centric Video Understanding.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans.
CoRR, 2023

Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving.
CoRR, 2023

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation.
CoRR, 2023

EC^2: Emergent Communication for Embodied Control.
CoRR, 2023

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer.
CoRR, 2023

Topology Reasoning for Driving Scenes.
CoRR, 2023

EGC: Image Generation and Classification via a Diffusion Energy-Based Model.
CoRR, 2023

Multi-Level Contrastive Learning for Dense Prediction Task.
CoRR, 2023

Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction.
CoRR, 2023

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception.
CoRR, 2023

Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Foundation Model is Efficient Multimodal Multitask Model Selector.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Neural MPC-Based Decision-Making Framework for Autonomous Driving in Multi-Lane Roundabout.
Proceedings of the 25th IEEE International Conference on Intelligent Transportation Systems, 2023

AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners.
Proceedings of the International Conference on Machine Learning, 2023

ChiPFormer: Transferable Chip Placement via Offline Decision Transformer.
Proceedings of the International Conference on Machine Learning, 2023

Learning Object-Language Alignments for Open-Vocabulary Object Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

RIGID: Recurrent GAN Inversion and Editing of Real Face Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Transformers for Open-world Instance Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Segment Every Reference Object in Spatial and Temporal Spaces.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scene as Occupancy.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Going Denser with Open-Vocabulary Part Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DDP: Diffusion Model for Dense Visual Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Beyond One-to-One: Rethinking the Referring Image Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffusionDet: Diffusion Model for Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Dense Distinct Query for End-to-End Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EC<sup>2</sup>: Emergent Communication for Embodied Control.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Policy Adaptation from Foundation Model Feedback.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Universal Instance Perception as Object Discovery and Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Structured Pruning for Efficient Generative Pre-trained Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

DrugOOD: Out-of-Distribution Dataset Curator and Benchmark for AI-Aided Drug Discovery - a Focus on Affinity Prediction Problems with Noise Annotations.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training.
IEEE Trans. Parallel Distributed Syst., 2022

MetaCloth: Learning Unseen Tasks of Dense Fashion Landmark Detection From a Few Samples.
IEEE Trans. Image Process., 2022

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

AAFL: Asynchronous-Adaptive Federated Learning in Edge-Based Wireless Communication Systems for Countering Communicable Infectious Diseasess.
IEEE J. Sel. Areas Commun., 2022

PVT v2: Improved baselines with Pyramid Vision Transformer.
Comput. Vis. Media, 2022

Cooperative Detection Method for DDoS Attacks Based on Blockchain.
Comput. Syst. Sci. Eng., 2022

Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models.
CoRR, 2022

Prototypical context-aware dynamics generalization for high-dimensional model-based reinforcement learning.
CoRR, 2022

Large-batch Optimization for Dense Visual Predictions.
CoRR, 2022

Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning.
CoRR, 2022

Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model.
CoRR, 2022

FedVeca: Federated Vectorized Averaging on Non-IID Data with Adaptive Bi-directional Global Objective.
CoRR, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.
CoRR, 2022

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild.
CoRR, 2022

Pose for Everything: Towards Category-Agnostic Pose Estimation.
CoRR, 2022

Exploiting Context Information for Generic Event Boundary Captioning.
CoRR, 2022

CO^3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.
CoRR, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval.
CoRR, 2022

Semantic-Aware Pretraining for Dense Video Captioning.
CoRR, 2022

M<sup>2</sup>BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation.
CoRR, 2022

WegFormer: Transformers for Weakly Supervised Semantic Segmentation.
CoRR, 2022

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery - A Focus on Affinity Prediction Problems with Noise Annotations.
CoRR, 2022

MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning.
CoRR, 2022

BridgeFormer: Bridging Video-text Retrieval with Multiple Choice Questions.
CoRR, 2022

Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Rethinking Resolution in the Context of Efficient Video Recognition.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix.
Proceedings of the International Conference on Machine Learning, 2022

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer.
Proceedings of the International Conference on Machine Learning, 2022

Flow-based Recurrent Belief State Learning for POMDPs.
Proceedings of the International Conference on Machine Learning, 2022

Objects in Semantic Topology.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Dynamic Token Normalization improves Vision Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Learning Versatile Neural Architectures by Propagating Network Codes.
Proceedings of the Tenth International Conference on Learning Representations, 2022

CycleMLP: A MLP-like Architecture for Dense Prediction.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Polygon-Free: Unconstrained Scene Text Detection with Box Annotations.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

ByteTrack: Multi-object Tracking by Associating Every Detection Box.
Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Grand Unification of Object Tracking.
Proceedings of the Computer Vision - ECCV 2022, 2022

Pose for Everything: Towards Category-Agnostic Pose Estimation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space.
Proceedings of the Computer Vision - ECCV 2022, 2022

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal.
Proceedings of the Computer Vision - ECCV 2022, 2022

PoseTrans: A Simple yet Effective Pose Transformation Augmentation for Human Pose Estimation.
Proceedings of the Computer Vision - ECCV 2022, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval.
Proceedings of the Computer Vision - ECCV 2022, 2022

DaViT: Dual Attention Vision Transformers.
Proceedings of the Computer Vision, 2022

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Language as Queries for Referring Video Object Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scale-Equivalent Distillation for Semi-Supervised Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Bridging Video-text Retrieval with Multiple Choice Questions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following.
Proceedings of the Conference on Robot Learning, 2022

Compression of Generative Pre-trained Language Models via Quantization.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Switchable Normalization for Learning-to-Normalize Deep Representation.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Dynamic Token Normalization Improves Vision Transformer.
CoRR, 2021

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation.
CoRR, 2021

ByteTrack: Multi-Object Tracking by Associating Every Detection Box.
CoRR, 2021

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning.
CoRR, 2021

Towards High-Quality Temporal Action Detection with Sparse Proposals.
CoRR, 2021

Panoptic SegFormer.
CoRR, 2021

CycleMLP: A MLP-like Architecture for Dense Prediction.
CoRR, 2021

PVTv2: Improved Baselines with Pyramid Vision Transformer.
CoRR, 2021

BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening.
CoRR, 2021

Unsupervised Pretraining for Object Detection by Patch Reidentification.
CoRR, 2021

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation.
CoRR, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.
CoRR, 2021

Trans2Seg: Transparent Object Segmentation with Transformer.
CoRR, 2021

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Model-Based Reinforcement Learning via Imagination with Derived Memory.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Compressed Video Contrastive Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Rethinking the Pruning Criteria for Convolutional Neural Network.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Multi-compound Transformer for Accurate Biomedical Image Segmentation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Segmenting Transparent Objects in the Wild with Transformer.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution.
Proceedings of the 38th International Conference on Machine Learning, 2021

What Makes for End-to-End Object Detection?
Proceedings of the 38th International Conference on Machine Learning, 2021

Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs.
Proceedings of the 9th International Conference on Learning Representations, 2021

STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

End-to-End Dense Video Captioning with Parallel Decoding.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Bringing Events into Video Deblurring with Non-consecutively Blurry Frames.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Watch Only Once: An End-to-End Video Action Detection Framework.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Adversarial Robustness for Unsupervised Domain Adaptation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Sparse R-CNN: End-to-End Object Detection With Learnable Proposals.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Parser-Free Virtual Try-On via Distilling Appearance Flows.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
SSN: Learning Sparse Switchable Normalization via SparsestMax.
Int. J. Comput. Vis., 2020

TransTrack: Multiple-Object Tracking with Transformer.
CoRR, 2020

OneNet: Towards End-to-End One-Stage Object Detection.
CoRR, 2020

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training.
CoRR, 2020

Convolution-Weight-Distribution Assumption: Rethinking the Criteria of Channel Pruning.
CoRR, 2020

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory.
CoRR, 2020

Domain-Adaptive Few-Shot Learning.
CoRR, 2020

How Does BN Increase Collapsed Neural Network Filters?
CoRR, 2020

UXNet: Searching Multi-level Feature Aggregation for 3D Medical Image Segmentation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2020, 2020

Channel Equilibrium Networks for Learning Deep Representation.
Proceedings of the 37th International Conference on Machine Learning, 2020

Webly Supervised Image Classification with Self-contained Confidence.
Proceedings of the Computer Vision - ECCV 2020, 2020

Segmenting Transparent Objects in the Wild.
Proceedings of the Computer Vision - ECCV 2020, 2020

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting.
Proceedings of the Computer Vision - ECCV 2020, 2020

Dynamic and Static Context-Aware LSTM for Multi-agent Motion Prediction.
Proceedings of the Computer Vision - ECCV 2020, 2020

Whole-Body Human Pose Estimation in the Wild.
Proceedings of the Computer Vision - ECCV 2020, 2020

Differentiable Hierarchical Graph Grouping for Multi-person Pose Estimation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Exemplar Normalization for Learning Deep Representation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

3D Human Mesh Regression With Dense Correspondence.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

PolarMask: Single Shot Instance Segmentation With Polar Representation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning a Reinforced Agent for Flexible Exposure Bracketing Selection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Online Knowledge Distillation via Collaborative Learning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Depth-Guided Convolutions for Monocular 3D Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Human Centric Visual Analysis with Deep Learning
Springer, ISBN: 978-981-13-2386-7, 2020

2019
SCAN: Self-and-Collaborative Attention Network for Video Person Re-Identification.
IEEE Trans. Image Process., 2019

TextSR: Content-Aware Text Super-Resolution Guided by Recognition.
CoRR, 2019

Towards Improving Generalization of Deep Networks via Consistent Normalization.
CoRR, 2019

WIDER Face and Pedestrian Challenge 2018: Methods and Results.
CoRR, 2019

DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images.
CoRR, 2019

Differentiable Dynamic Normalization for Learning Deep Representation.
Proceedings of the 36th International Conference on Machine Learning, 2019

Towards Understanding Regularization in Batch Normalization.
Proceedings of the 7th International Conference on Learning Representations, 2019

Differentiable Learning-to-Normalize via Switchable Normalization.
Proceedings of the 7th International Conference on Learning Representations, 2019

Vision-Infused Deep Audio Inpainting.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Switchable Whitening for Deep Representation Learning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Deep Self-Learning From Noisy Labels.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

SSN: Learning Sparse Switchable Normalization via SparsestMax.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Faceness-Net: Face Detection through Deep Facial Part Responses.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Deep Learning Markov Random Field for Semantic Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

From Facial Expression Recognition to Interpersonal Relation Prediction.
Int. J. Comput. Vis., 2018

FaceFeat-GAN: a Two-Stage Approach for Identity-Preserving Face Synthesis.
CoRR, 2018

Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?
CoRR, 2018

Differentiable Learning-to-Normalize via Switchable Normalization.
CoRR, 2018

Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches.
CoRR, 2018

Kalman Normalization: Normalizing Internal Representations Across Network Layers.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net.
Proceedings of the Computer Vision - ECCV 2018, 2018

FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

CUImage: A Neverending Learning Platform on a Convolutional Knowledge Graph of Billion Web Images.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

Scheduling Large-scale Distributed Training via Reinforcement Learning.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Spatial as Deep: Spatial CNN for Traffic Scene Understanding.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Video Object Segmentation with Re-identification.
CoRR, 2017

Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

EigenNet: Towards Fast and Structural Learning of Deep Neural Networks.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Learning Deep Architectures via Generalized Whitened Neural Networks.
Proceedings of the 34th International Conference on Machine Learning, 2017

Deep Dual Learning for Semantic Image Segmentation.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning Object Interactions and Descriptions for Semantic Image Segmentation.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Learning Compositional Shape Models of Multiple Distance Metrics by Information Projection.
IEEE Trans. Neural Networks Learn. Syst., 2016

Clothes Co-Parsing Via Joint Image Segmentation and Labeling With Application to Clothing Retrieval.
IEEE Trans. Multim., 2016

Learning Deep Representation for Face Alignment with Auxiliary Attributes.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

Joint Face Representation Adaptation and Clustering in Videos.
Proceedings of the Computer Vision - ECCV 2016, 2016

Fashion Landmark Detection in the Wild.
Proceedings of the Computer Vision - ECCV 2016, 2016

WIDER FACE: A Face Detection Benchmark.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Face Model Compression by Distilling Knowledge from Neurons.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Learning to Recognize Pedestrian Attribute.
CoRR, 2015

Learning Social Relation Traits from Face Images.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

From Facial Parts Responses to Face Detection: A Deep Learning Approach.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Deep Learning Strong Parts for Pedestrian Detection.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Deep Learning Face Attributes in the Wild.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Semantic Image Segmentation via Deep Parsing Network.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

A large-scale car dataset for fine-grained categorization and verification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Pedestrian detection aided by deep learning semantic tasks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

DeepID-Net: Deformable deep convolutional neural networks for object detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Deep Representation Learning with Target Coding.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Deep learning for attribute inference, parsing, and recognition of face.
PhD thesis, 2014

Deep Learning Multi-View Representation for Face Recognition.
CoRR, 2014

Recover Canonical-View Faces in the Wild with Deep Neural Networks.
CoRR, 2014

Learning and Transferring Multi-task Deep Representation for Face Alignment.
CoRR, 2014

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection.
CoRR, 2014

Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Pedestrian Attribute Recognition At Far Distance.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Facial Landmark Detection by Deep Multi-task Learning.
Proceedings of the Computer Vision - ECCV 2014, 2014

Clothing Co-parsing by Joint Image Segmentation and Labeling.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Switchable Deep Network for Pedestrian Detection.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013
Deep Learning Identity-Preserving Face Space.
Proceedings of the IEEE International Conference on Computer Vision, 2013

A Deep Sum-Product Architecture for Robust Facial Attributes Analysis.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Pedestrian Parsing via Deep Decompositional Network.
Proceedings of the IEEE International Conference on Computer Vision, 2013

2012
Representing and recognizing objects with massive local image patches.
Pattern Recognit., 2012

Joint semantic segmentation by searching for compatible-competitive references.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Hierarchical face parsing via deep learning.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2010
A Discriminative Model for Object Representation and Detection via Sparse Features.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Semantics-driven portrait cartoon stylization.
Proceedings of the International Conference on Image Processing, 2010

Learning Shape Detector by Quantizing Curve Segments with Multiple Distance Metrics.
Proceedings of the Computer Vision, 2010

2009
Hierarchical 3D perception from a single image.
Proceedings of the International Conference on Image Processing, 2009


  Loading...