Hao Tang

Orcid: 0000-0002-2077-1246

Affiliations:
  • Peking University, National Key Laboratory for Multimedia Information Processing, Beijing, China
  • Carnegie Mellon University, Robotics Institute, Pittsburgh, PA, USA (former)
  • ETH Zurich, Computer Vision Laboratory, Zürich, Switzerland (former)
  • University of Oxford, Department of Engineering Science, Oxford, UK (former)
  • University of Trento, Multimedia and Human Understanding Group, Italy (former, PhD 2021)
  • Peking University Shenzhen Graduate School, Key Laboratory for Machine Perception, Shenzhen, China (former)


According to our database1, Hao Tang authored at least 293 papers between 2015 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
All-in-One Transformer for Image Restoration Under Adverse Weather Degradations.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2026

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results.
CoRR, April, 2026

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making.
CoRR, March, 2026

OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis.
CoRR, March, 2026

MMA: Multimodal Memory Agent.
CoRR, February, 2026

GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning.
CoRR, February, 2026

Hallucination Begins Where Saliency Drops.
CoRR, January, 2026

WebCryptoAgent: Agentic Crypto Trading with Web Informatics.
CoRR, January, 2026

MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing.
CoRR, January, 2026

Knowledge-Enhanced Dynamic Scene Graph Attention Network for Fake News Video Detection.
IEEE Trans. Multim., 2026

AAGFormer: A self-adaptive graph-transformer synergy with topological normalization for 3D human pose estimation.
Image Vis. Comput., 2026

TR-DQ: Time-Rotation Diffusion Quantization.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

ICM-Fusion: In-Context Meta-Optimized LoRA Fusion for Multi-Task Adaptation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Dual Attention Guidance Network for Self-Supervised Monocular Depth Estimation.
IEEE Trans. Circuits Syst. Video Technol., December, 2025

TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation.
CoRR, December, 2025

DragMesh: Interactive 3D Generation Made Easy.
CoRR, December, 2025

EgoLCD: Egocentric Video Generation with Long Context Diffusion.
CoRR, December, 2025

ReactionMamba: Generating Short &Long Human Reaction Sequences.
CoRR, December, 2025

Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving.
CoRR, November, 2025

Alias-free 4D Gaussian Splatting.
CoRR, November, 2025

MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots.
CoRR, November, 2025

EvoVLA: Self-Evolving Vision-Language-Action Model.
CoRR, November, 2025

Dual-Path Transformer-Based GAN for Co-speech Gesture Synthesis.
Int. J. Soc. Robotics, October, 2025

VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery.
CoRR, October, 2025

AutoViT: Achieving Real-Time Vision Transformers on Mobile via Latency-aware Coarse-to-Fine Search.
Int. J. Comput. Vis., September, 2025

Fidelity-Aware Data Composition for Robust Robot Generalization.
CoRR, September, 2025

UniVid: The Open-Source Unified Video Model.
CoRR, September, 2025

StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes.
CoRR, September, 2025

Nav-R1: Reasoning and Navigation in Embodied Scenes.
CoRR, September, 2025

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis.
CoRR, September, 2025

Multimodal Data Storage and Retrieval for Embodied AI: A Survey.
CoRR, August, 2025

RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory.
CoRR, August, 2025

ReMoMask: Retrieval-Augmented Masked Motion Generation.
CoRR, August, 2025

3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding.
CoRR, July, 2025

UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing.
CoRR, July, 2025

Graph-based Multi-Modal Interaction Lightweight Network for Brain Tumor Segmentation (GMLN-BTS) in Edge Iterative MRI Lesion Localization System (EdgeIMLocSys).
CoRR, July, 2025

ScoreAdv: Score-based Targeted Generation of Natural Adversarial Examples via Diffusion Models.
CoRR, July, 2025

Hierarchical Distribution-Based Exemplar Replay for Incremental SAR Automatic Target Recognition.
IEEE Trans. Aerosp. Electron. Syst., June, 2025

Style Transfer: A Decade Survey.
CoRR, June, 2025

Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration.
CoRR, June, 2025

Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts.
CoRR, June, 2025

FOLIAGE: Towards Physical Intelligence World Models Via Unbounded Surface Evolution.
CoRR, June, 2025

Enhancing Diffusion-based Unrestricted Adversarial Attacks via Adversary Preferences Alignment.
CoRR, June, 2025

Enhanced Multi-Scale Cross-Attention for Person Image Generation.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2025

Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation.
CoRR, May, 2025

SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams.
CoRR, May, 2025

SAMba-UNet: Synergizing SAM2 and Mamba in UNet with Heterogeneous Aggregation for Cardiac MRI Segmentation.
CoRR, May, 2025

CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation.
CoRR, May, 2025

Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image.
CoRR, May, 2025

Structured Agent Distillation for Large Language Model.
CoRR, May, 2025

TSLA: A Task-Specific Learning Adaptation for Semantic Segmentation on Autonomous Vehicles Platform.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2025

Multimodal Large Language Models for Medicine: A Comprehensive Survey.
CoRR, April, 2025

TTTFusion: A Test-Time Training-Based Strategy for Multimodal Medical Image Fusion in Surgical Robots.
CoRR, April, 2025

DMS-Net:Dual-Modal Multi-Scale Siamese Network for Binocular Fundus Image Classification.
CoRR, April, 2025

Cabbage: A Differential Growth Framework for Open Surfaces.
CoRR, April, 2025

Multimodal Perception for Goal-oriented Navigation: A Survey.
CoRR, April, 2025

EventVAD: Training-Free Event-Aware Video Anomaly Detection.
CoRR, April, 2025

3D CoCa: Contrastive Learners are 3D Captioners.
CoRR, April, 2025

Wakeup-Darkness: When Multimodal Meets Unsupervised Low-Light Image Enhancement.
ACM Trans. Multim. Comput. Commun. Appl., March, 2025

Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance.
CoRR, March, 2025

Dynamic Scene Reconstruction: Recent Advance in Real-time Rendering and Streaming.
CoRR, March, 2025

TR-DQ: Time-Rotation Diffusion Quantization.
CoRR, March, 2025

When Continue Learning Meets Multimodal Large Language Model: A Survey.
CoRR, March, 2025

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface.
CoRR, March, 2025

Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation.
CoRR, February, 2025

FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation.
CoRR, February, 2025

RFMedSAM 2: Automatic Prompt Refinement for Enhanced Volumetric Medical Image Segmentation with SAM 2.
CoRR, February, 2025

Self-Prompt SAM: Medical Image Segmentation via Automatic Prompt SAM Adaptation.
CoRR, February, 2025

UDiTQC: U-Net-Style Diffusion Transformer for Quantum Circuit Synthesis.
CoRR, January, 2025

RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation.
CoRR, January, 2025

Boosting Adversarial Transferability with Spatial Adversarial Alignment.
CoRR, January, 2025

Hierarchical Cross-Attention Network for Virtual Try-On.
IEEE Trans. Multim., 2025

A pure MLP-Mixer-based GAN framework for guided image translation.
Pattern Recognit., 2025

GraphMLP: A graph MLP-like architecture for 3D human pose estimation.
Pattern Recognit., 2025

BCDPose: Diffusion-based 3D Human Pose Estimation with bone-chain prior knowledge.
Image Vis. Comput., 2025

Generalization-preserving adaptation of vision-language models for open-vocabulary segmentation.
Comput. Vis. Image Underst., 2025

Q-TempFusion: Quantization-Aware Temporal Multi-Sensor Fusion on Bird's-Eye View Representation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

EventVAD: Training-Free Event-Aware Video Anomaly Detection.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

AccidentBlip: Agent of Accident Warning Based on MA-Former.
Proceedings of the IEEE Intelligent Vehicles Symposium, 2025

CRUISE: Cooperative Reconstruction and Editing in V2X Scenarios using Gaussian Splatting.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

FairSMOE: Mitigating Multi-Attribute Fairness Problem with Sparse Mixture-of-Experts.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

In-Context Meta LoRA Generation.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Semantic-Guided Diffusion Model for Single-Step Image Super-Resolution.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

MaskSAM: Auto-Prompt SAM with Mask Classification for Volumetric Medical Image Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MambaIC: State Space Models for High-Performance Learned Image Compression.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DiffFNO: Diffusion Fourier Neural Operator.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Stable-Hair: Real-World Hair Transfer via Diffusion Model.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

ARNet: Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Physical Adversarial Attack Meets Computer Vision: A Decade Survey.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Shadclips: When Parameter-Efficient Fine-Tuning with Multimodal Meets Shadow Removal.
Int. J. Pattern Recognit. Artif. Intell., December, 2024

Graph Transformer GANs With Graph Masked Modeling for Architectural Layout Generation.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2024

Toward High-Quality HDR Deghosting With Conditional Diffusion Models.
IEEE Trans. Circuits Syst. Video Technol., May, 2024

Cloth Interactive Transformer for Virtual Try-On.
ACM Trans. Multim. Comput. Commun. Appl., April, 2024

ControlFace: Feature Disentangling for Controllable Face Swapping.
J. Imaging, January, 2024

Adapting Segment Anything Model for Change Detection in VHR Remote Sensing Images.
IEEE Trans. Geosci. Remote. Sens., 2024

PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model.
CoRR, 2024

Network Inversion and Its Applications.
CoRR, 2024

Multimodal Alignment and Fusion: A Survey.
CoRR, 2024

Text-to-Image Synthesis: A Decade Survey.
CoRR, 2024

AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations.
CoRR, 2024

KMM: Key Frame Mask Mamba for Extended Motion Generation.
CoRR, 2024

GWQ: Gradient-Aware Weight Quantization for Large Language Models.
CoRR, 2024

M<sup>2</sup>M: Learning controllable Multi of experts and multi-scale operators are the Partial Differential Equations need.
CoRR, 2024

Brain Tumor Classification on MRI in Light of Molecular Markers.
CoRR, 2024

Data-Free Class Incremental Gesture Recognition via Synthetic Feature Sampling.
CoRR, 2024

Barbie: Text to Barbie-Style 3D Avatars.
CoRR, 2024

InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation.
CoRR, 2024

From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models.
CoRR, 2024

A Survey on Multimodal Wearable Sensor-based Human Action Recognition.
CoRR, 2024

MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation.
CoRR, 2024

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion.
CoRR, 2024

SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior.
CoRR, 2024

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM.
CoRR, 2024

Machine learning and human-machine trust in healthcare: A systematic survey.
CAAI Trans. Intell. Technol., 2024

Edge-guided representation learning for underwater object detection.
CAAI Trans. Intell. Technol., 2024

Mining and Unifying Heterogeneous Contrastive Relations for Weakly-Supervised Actor-Action Segmentation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Bipartite Graph Diffusion Model for Human Interaction Generation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Revisiting Adversarial Patches for Designing Camera-Agnostic Attacks against Person Detection.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

CoIn: A Lightweight and Effective Framework for Story Visualization and Continuation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Monocular Expressive 3D Human Reconstruction of Multiple People.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Audio-Visual Navigation with Anti-Backtracking.
Proceedings of the Pattern Recognition - 27th International Conference, 2024

Adaptive Cross-Architecture Mutual Knowledge Distillation.
Proceedings of the 18th IEEE International Conference on Automatic Face and Gesture Recognition, 2024

Motion Mamba: Efficient and Long Sequence Motion Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance.
Proceedings of the Computer Vision - ECCV 2024, 2024

GiT: Towards Generalist Vision Transformer Through Universal Language Interface.
Proceedings of the Computer Vision - ECCV 2024, 2024

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion.
Proceedings of the Computer Vision - ECCV 2024, 2024


InstructGIE: Towards Generalizable Image Editing.
Proceedings of the Computer Vision - ECCV 2024, 2024

SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis.
Proceedings of the Computer Vision - ECCV 2024, 2024

Versatile Navigation Under Partial Observability via Value-Guided Diffusion Policy.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

On the Faithfulness of Vision Transformer Explanations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Token Transformation Matters: Towards Faithful Post-Hoc Explanation for Vision Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Online Real-Time Memory-based Video Inpainting Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Distilling ODE Solvers of Diffusion Models into Smaller Steps.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Robust 3D Pose Transfer with Adversarial Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MS-UMLP: Medical Image Segmentation via Multi-Scale U-shape MLP-Mixer.
Proceedings of the Computer Vision - ACCV 2024, 2024

G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Edge Guided GANs With Multi-Scale Contrastive Learning for Semantic Image Synthesis.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

STRFormer: Spatial-Temporal-ReTemporal Transformer for 3D human pose estimation.
Image Vis. Comput., December, 2023

Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis.
Mach. Intell. Res., December, 2023

On-device audio-visual multi-person wake word spotting.
CAAI Trans. Intell. Technol., December, 2023

Measuring the Consistency and Diversity of 3D Face Generation.
IEEE J. Sel. Top. Signal Process., November, 2023

Go Closer to See Better: Camouflaged Object Detection via Object Area Amplification and Figure-Ground Conversion.
IEEE Trans. Circuits Syst. Video Technol., October, 2023

Interactive Neural Painting.
Comput. Vis. Image Underst., October, 2023

Multi-hypothesis representation learning for transformer-based 3D human pose estimation.
Pattern Recognit., September, 2023

AO2-DETR: Arbitrary-Oriented Object Detection Transformer.
IEEE Trans. Circuits Syst. Video Technol., May, 2023

Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

AttentionGAN: Unpaired Image-to-Image Translation Using Attention-Guided Generative Adversarial Networks.
IEEE Trans. Neural Networks Learn. Syst., April, 2023

Bipartite Graph Reasoning GANs for Person Pose and Facial Image Synthesis.
Int. J. Comput. Vis., March, 2023

Disentangle Saliency Detection into Cascaded Detail Modeling and Body Filling.
ACM Trans. Multim. Comput. Commun. Appl., January, 2023

Bidirectional Transformer GAN for Long-term Human Motion Prediction.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Deep Unsupervised Key Frame Extraction for Efficient Video Classification.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Continual Attentive Fusion for Incremental Learning in Semantic Segmentation.
IEEE Trans. Multim., 2023

Cross-View Panorama Image Synthesis.
IEEE Trans. Multim., 2023

Interaction Transformer for Human Reaction Generation.
IEEE Trans. Multim., 2023

3D-Aware Video Generation.
Trans. Mach. Learn. Res., 2023

Adaptive Convolutional Subspace Reasoning Network for Few-Shot SAR Target Recognition.
IEEE Trans. Geosci. Remote. Sens., 2023

Transductive Prototypical Attention Reasoning Network for Few-Shot SAR Target Recognition.
IEEE Trans. Geosci. Remote. Sens., 2023

Local and Global GANs With Semantic-Aware Upsampling for Image Generation.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

Towards High-quality HDR Deghosting with Conditional Diffusion Models.
CoRR, 2023

Adapting Segment Anything Model for Change Detection in HR Remote Sensing Images.
CoRR, 2023

Reversible Graph Neural Network-based Reaction Distribution Learning for Multiple Appropriate Facial Reactions Generation.
CoRR, 2023

Few-shot Medical Image Segmentation with Cycle-resemblance Attention.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Does Graph Distillation See Like Vision Dataset Counterpart?
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Data Level Lottery Ticket Hypothesis for Vision Transformers.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

RZCR: Zero-shot Character Recognition via Radical-based Reasoning.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

SpeedDETR: Speed-aware Transformers for End-to-end Object Detection.
Proceedings of the International Conference on Machine Learning, 2023

Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TINYCOD: Tiny and Effective Model for Camouflaged Object Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

MLP-GAN for Brain Vessel Image Segmentation.
Proceedings of the IEEE International Conference on Acoustics, 2023

PI-Trans: Parallel-Convmlp and Implicit-Transformation Based Gan for Cross-View Image Translation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Pruning Parameterization with Bi-level Optimization for Efficient Semantic Segmentation on the Edge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LSDIR: A Large Scale Dataset for Image Restoration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Graph Transformer GANs for Graph-Constrained House Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Temporal-aware Hierarchical Mask Classification for Video Semantic Segmentation.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

HOTCOLD Block: Fooling Thermal Infrared Detectors with a Novel Wearable Design.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

DE-net: Dynamic Text-Guided Image Editing Adversarial Networks.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Towards Real-Time Segmentation on the Edge.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Total Generate: Cycle in Cycle Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes.
IEEE Trans. Multim., 2022

Unsupervised High-Resolution Portrait Gaze Correction and Animation.
IEEE Trans. Image Process., 2022

Quasi-Equilibrium Feature Pyramid Network for Salient Object Detection.
IEEE Trans. Image Process., 2022

Adversarial Shape Learning for Building Extraction in VHR Remote Sensing Images.
IEEE Trans. Image Process., 2022

Supervised Multi-Scale Attention-Guided Ship Detection in Optical Remote Sensing Images.
IEEE Trans. Geosci. Remote. Sens., 2022

Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images.
IEEE Trans. Geosci. Remote. Sens., 2022

Facial Expression Translation Using Landmark Guided GANs.
IEEE Trans. Affect. Comput., 2022

Cross-view panorama image synthesis with progressive attention GANs.
Pattern Recognit., 2022

PB-GCN: Progressive binary graph convolutional networks for skeleton-based action recognition.
Neurocomputing, 2022

The Lottery Ticket Hypothesis for Vision Transformers.
CoRR, 2022

Physical Adversarial Attack meets Computer Vision: A Decade Survey.
CoRR, 2022

Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation.
CoRR, 2022

Vector Quantized Diffusion Model with CodeUnet for Text-to-Sign Pose Sequences Generation.
CoRR, 2022

REZCR: A Zero-shot Character Recognition Method via Radical Extraction.
CoRR, 2022

Contrastive Learning from Spatio-Temporal Mixed Skeleton Sequences for Self-Supervised Skeleton-Based Action Recognition.
CoRR, 2022

GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation.
CoRR, 2022

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis.
CoRR, 2022

RCRN: Real-world Character Image Restoration Network via Skeleton Extraction.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Real-Time Portrait Stylization on the Edge.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

A Cloth-Irrelevant Harmonious Attention Network for Cloth-Changing Person Re-identification.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

Identity-Sensitive Knowledge Propagation for Cloth-Changing Person Re-Identification.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

Unsupervised Domain Adaptation Person Re-Identification by Camera-Aware Style Decoupling and Uncertainty Modeling.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

Graph-Based Generative Face Anonymisation with Pose Preservation.
Proceedings of the Image Analysis and Processing - ICIAP 2022, 2022

Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

3D-Aware Semantic-Guided Generative Model for Human Synthesis.
Proceedings of the Computer Vision - ECCV 2022, 2022

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution.
Proceedings of the Computer Vision - ECCV 2022, 2022

Mining Relations Among Cross-Frame Affinities for Video Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Interpretable Video Super-Resolution via Alternating Optimization.
Proceedings of the Computer Vision - ECCV 2022, 2022

FPGA-aware automatic acceleration framework for vision transformer with mixed-scheme quantization: late breaking results.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning to Restore 3D Face from In-the-Wild Degraded Images.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Physically-guided Disentangled Implicit Rendering for 3D Face Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SiNeRF: Sinusoidal Neural Radiance Fields for Joint Pose Estimation and Scene Reconstruction.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Geometry-Contrastive Transformer for Generalized 3D Pose Transfer.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
When Dictionary Learning Meets Deep Learning: Deep Dictionary Learning and Coding Network for Image Recognition With Limited Data.
IEEE Trans. Neural Networks Learn. Syst., 2021

Layout-to-Image Translation With Double Pooling Generative Adversarial Networks.
IEEE Trans. Image Process., 2021

LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images.
IEEE Trans. Geosci. Remote. Sens., 2021

Structured discriminative tensor dictionary learning for unsupervised domain adaptation.
Neurocomputing, 2021

SPViT: Enabling Faster Vision Transformers via Soft Token Pruning.
CoRR, 2021

Global and Local Alignment Networks for Unpaired Image-to-Image Translation.
CoRR, 2021

Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation.
CoRR, 2021

Looking Outside the Window: Wider-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images.
CoRR, 2021

Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization.
CoRR, 2021

Transformer-Based Source-Free Domain Adaptation.
CoRR, 2021

Cloth Interactive Transformer for Virtual Try-On.
CoRR, 2021

Transformers Solve the Limited Receptive Field for Monocular Depth Prediction.
CoRR, 2021

Adversarial Shape Learning for Building Extraction in VHR Remote Sensing Images.
CoRR, 2021

Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Cross-View Exocentric to Egocentric Video Synthesis.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Highly Efficient Natural Image Matting.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

AniFormer: Data-driven 3D Animation with Transformer.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Unified Generative Adversarial Networks for Controllable Image-to-Image Translation.
IEEE Trans. Image Process., 2020

Relevant region prediction for crowd counting.
Neurocomputing, 2020

DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis.
CoRR, 2020

Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis.
CoRR, 2020

Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation.
CoRR, 2020

Cross-View Image Synthesis with Deformable Convolution and Attention Mechanism.
Proceedings of the Pattern Recognition and Computer Vision, Third Chinese Conference, 2020

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Dual Attention GANs for Semantic Image Synthesis.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Exocentric to Egocentric Image Generation Via Parallel Generative Adversarial Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

XingGAN for Person Image Generation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Bipartite Graph Reasoning GANs for Person Image Generation.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

2019
Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion.
Neurocomputing, 2019

Asymmetric Generative Adversarial Networks for Image-to-Image Translation.
CoRR, 2019

Improving Semantic Segmentation of Aerial Images Using Patch-based Attention.
CoRR, 2019

GazeCorrection: Self-Guided Eye Manipulation in the wild using Self-Supervised Generative Adversarial Networks.
CoRR, 2019

Structured Discriminative Tensor Dictionary Learning for Unsupervised Domain Adaptation.
CoRR, 2019

Deep Micro-Dictionary Learning and Coding Network.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019

Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation.
Proceedings of the International Joint Conference on Neural Networks, 2019

Joint Learning of Self-Representation and Indicator for Multi-View Image Clustering.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Expression Conditional Gan for Facial Expression-to-Expression Translation.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Attribute-Guided Sketch Generation.
Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, 2019

Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
GestureGAN for Hand Gesture-to-Gesture Translation in the Wild.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Dual Generator Generative Adversarial Networks for Multi-domain Image-to-Image Translation.
Proceedings of the Computer Vision - ACCV 2018, 2018

2016
Sequential Bag-of-Words model for human action classification.
CAAI Trans. Intell. Technol., 2016

Adaptive Region Boosting method with biased entropy for path planning in changing environment.
CAAI Trans. Intell. Technol., 2016

A Novel Feature Matching Strategy for Large Scale Image Retrieval.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

2015
Gender Classification Using Pyramid Segmentation for Unconstrained Back-facing Video Sequences.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

SDM-BSM: A fusing depth scheme for human action recognition.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Two-Layers Local Coordinate Coding.
Proceedings of the Computer Vision - CCF Chinese Conference, 2015


  Loading...