Shanghang Zhang

Orcid: 0000-0003-4047-3526

According to our database1, Shanghang Zhang authored at least 258 papers between 2012 and 2026.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
CoCoGesture: Towards coherent co-speech 3D gesture generation in the wild.
Inf. Fusion, 2026

2025
RepCaM++: Exploring Transparent Visual Prompt With Inference-Time Re-Parameterization for Neural Video Delivery.
IEEE Trans. Mob. Comput., September, 2025

EEG-Driven Classification of Driver Mental Workload in Diverse Environments: A Dual-Branch Network for Efficient In-Vehicle Applications.
IEEE Internet Things J., September, 2025

NavA<sup>3</sup>: Understanding Any Instruction, Navigating Anywhere, Finding Anything.
CoRR, August, 2025

UniEdit-I: Training-free Image Editing for Unified VLM via Iterative Understanding, Editing and Verifying.
CoRR, August, 2025

FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning.
CoRR, July, 2025

Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition.
CoRR, July, 2025

RwoR: Generating Robot Demonstrations from Human Hand Collection for Policy Learning without Robot.
CoRR, July, 2025

RoboBrain 2.0 Technical Report.
CoRR, July, 2025

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation.
CoRR, July, 2025

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents.
CoRR, June, 2025

MinD: Unified Visual Imagination and Control via Hierarchical World Models.
CoRR, June, 2025

FastInit: Fast Noise Initialization for Temporally Consistent Video Generation.
CoRR, June, 2025

AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models.
CoRR, June, 2025

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
CoRR, June, 2025

Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought.
CoRR, June, 2025

SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game.
CoRR, June, 2025

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics.
CoRR, June, 2025

Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning.
CoRR, June, 2025

BEVUDA++: Geometric-Aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection.
IEEE Trans. Circuits Syst. Video Technol., May, 2025

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control.
CoRR, May, 2025

OmniIndoor3D: Comprehensive Indoor 3D Reconstruction.
CoRR, May, 2025

SpikeGen: Generative Framework for Visual Spike Stream Processing.
CoRR, May, 2025

AFCL: Analytic Federated Continual Learning for Spatio-Temporal Invariance of Non-IID Data.
CoRR, May, 2025

ACU: Analytic Continual Unlearning for Efficient and Exact Forgetting with Privacy Preservation.
CoRR, May, 2025

H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos.
CoRR, May, 2025

FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers.
CoRR, May, 2025

RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration.
CoRR, May, 2025

CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation.
CoRR, May, 2025

Co<sup>3</sup>Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion.
CoRR, May, 2025

ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance.
CoRR, April, 2025

EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler.
CoRR, April, 2025

Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.
CoRR, March, 2025

MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation.
CoRR, March, 2025

EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?
CoRR, March, 2025

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model.
CoRR, March, 2025

AffordGrasp: In-Context Affordance Reasoning for Open-Vocabulary Task-Oriented Grasping in Clutter.
CoRR, March, 2025

Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network With Graph Representation Learning.
IEEE Trans. Neural Networks Learn. Syst., February, 2025

CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World.
CoRR, February, 2025

SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation.
CoRR, January, 2025

PINNsAgent: Automated PDE Surrogation with Large Language Models.
CoRR, January, 2025

Empowering Corner Case Detection in Autonomous Vehicles With Multimodal Large Language Models.
IEEE Signal Process. Lett., 2025

A diffusion-based feature enhancement approach for driving behavior classification with EEG data.
Adv. Eng. Informatics, 2025

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Co3Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficient Quality Controllable Neural Image Compression based on QD-Model.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

GaussianEnhancer: A General Rendering Enhancer for Gaussian Splatting.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Three-Stage Progressive Pre-Analysis Framework for VMAF Controllable Image Coding.
Proceedings of the Data Compression Conference, 2025

Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Segment Any Motion in Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Subgraph Aggregation for Out-of-Distribution Generalization on Graphs.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

DesignEdit: Unify Spatial-Aware Image Editing via Training-free Inpainting with a Multi-Layered Latent Diffusion Framework.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments.
IEEE Robotics Autom. Lett., September, 2024

Exploring Generalizable Distillation for Efficient Medical Image Segmentation.
IEEE J. Biomed. Health Informatics, July, 2024

BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for Multi-View BEV 3D Object Detection.
IEEE Trans. Intell. Veh., January, 2024

DECOR: Dynamic Decoupling and Multiobjective Optimization for Long-Tailed Remote Sensing Image Classification.
IEEE Trans. Geosci. Remote. Sens., 2024

A lightweight multi-layer perceptron for efficient multivariate time series forecasting.
Knowl. Based Syst., 2024

The Emerging Issues in Bioimaging AI Publications and Research (Dagstuhl Seminar 24042).
Dagstuhl Reports, 2024

SCBench: A Sports Commentary Benchmark for Video LLMs.
CoRR, 2024

RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation.
CoRR, 2024

GaussianAD: Gaussian-Centric End-to-End Autonomous Driving.
CoRR, 2024

GPD-1: Generative Pre-training for Driving.
CoRR, 2024

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance.
CoRR, 2024

Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model.
CoRR, 2024

[CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
CoRR, 2024

Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation.
CoRR, 2024

Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective.
CoRR, 2024

EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting.
CoRR, 2024

MC-LLaVA: Multi-Concept Personalized Vision-Language Model.
CoRR, 2024

Learning from Different Samples: A Source-free Framework for Semi-supervised Domain Adaptation.
CoRR, 2024

Training-free Regional Prompting for Diffusion Transformers.
CoRR, 2024

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective.
CoRR, 2024

EVA: An Embodied World Model for Future Video Anticipation.
CoRR, 2024

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference.
CoRR, 2024

Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation.
CoRR, 2024

Discovering Long-Term Effects on Parameter Efficient Fine-tuning.
CoRR, 2024

FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models.
CoRR, 2024

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions.
CoRR, 2024

Multimodal Large Language Models for Bioimage Analysis.
CoRR, 2024

MAVIS: Mathematical Visual Instruction Tuning.
CoRR, 2024

Fisher-aware Quantization for DETR Detectors with Critical-category Objectives.
CoRR, 2024

MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception.
CoRR, 2024

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation.
CoRR, 2024

S<sup>3</sup>Gaussian: Self-Supervised Street Gaussians for Autonomous Driving.
CoRR, 2024

Implicit Neural Image Field for Biological Microscopy Image Compression.
CoRR, 2024

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation.
CoRR, 2024

Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation.
CoRR, 2024

Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention.
CoRR, 2024

Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning.
CoRR, 2024

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding.
CoRR, 2024

SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera.
CoRR, 2024

Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection.
CoRR, 2024

DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing.
CoRR, 2024

A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge - Multi-Task Robustness Track.
CoRR, 2024

Building Flexible Machine Learning Models for Scientific Computing at Scale.
CoRR, 2024

Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis.
CoRR, 2024

VeCAF: VLM-empowered Collaborative Active Finetuning with Training Objective Awareness.
CoRR, 2024

RustNeRF: Robust Neural Radiance Field with Low-Quality Images.
CoRR, 2024

TCP: Triplet Contrastive-relationship Preserving for Class-Incremental Learning.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Unveiling the Tapestry of Consistency in Large Vision-Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Distribution-Aware Continual Test-Time Adaptation for Semantic Segmentation.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Compositional Few-Shot Class-Incremental Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

VLUReID: Exploiting Vision-Language Knowledge for Unsupervised Person Re-Identification.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Enhanced Blind Watermarking Against Black-Box Noise: Leveraging CIN Framework.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

I-MedSAM: Implicit Medical Image Segmentation with Segment Anything.
Proceedings of the Computer Vision - ECCV 2024, 2024

LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

Gradient-based Parameter Selection for Efficient Fine-Tuning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FreeKD: Knowledge Distillation via Semantic Frequency Prompt.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Cloud-Device Collaborative Learning for Multimodal Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-Speech Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Frame-Recurrent Video Crowd Counting.
IEEE Trans. Circuits Syst. Video Technol., September, 2023

Learning Deep Features for Robotic Inference From Physical Interactions.
IEEE Trans. Cogn. Dev. Syst., September, 2023

Expanding the prediction capacity in long sequence time-series forecasting.
Artif. Intell., May, 2023

P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification.
Remote. Sens., April, 2023

Caching in Dynamic Environments: A Near-Optimal Online Learning Approach.
IEEE Trans. Multim., 2023

Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation.
CoRR, 2023

Cloud-Device Collaborative Learning for Multimodal Large Language Models.
CoRR, 2023

Iterative Prompt Relabeling for diffusion model with RLDF.
CoRR, 2023

FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection.
CoRR, 2023

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding.
CoRR, 2023

Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.
CoRR, 2023

Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior.
CoRR, 2023

Split & Merge: Unlocking the Potential of Visual Adapters via Sparse Training.
CoRR, 2023

MoEC: Mixture of Experts Implicit Neural Compression.
CoRR, 2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.
CoRR, 2023

COLE: A Hierarchical Generation Framework for Graphic Design.
CoRR, 2023

Heterogenous Memory Augmented Neural Networks.
CoRR, 2023

Distribution-Aware Continual Test Time Adaptation for Semantic Segmentation.
CoRR, 2023

NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything.
CoRR, 2023

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision.
CoRR, 2023

PM-DETR: Domain Adaptive Prompt Memory for Object Detection with Transformers.
CoRR, 2023

DiffuseIR: Diffusion Models For Isotropic Reconstruction of 3D Microscopic Images.
CoRR, 2023

UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric and Semantic Rendering.
CoRR, 2023

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation.
CoRR, 2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
CoRR, 2023

Chain of Thought Prompt Tuning in Vision Language Models.
CoRR, 2023

MoWE: Mixture of Weather Experts for Multiple Adverse Weather Removal.
CoRR, 2023

Exploring Sparse Visual Prompt for Cross-domain Semantic Segmentation.
CoRR, 2023

When Visible Light (Backscatter) Communication Meets Neuromorphic Cameras in V2X.
Proceedings of the 24th International Workshop on Mobile Computing Systems and Applications, 2023

RepCaM: Re-parameterization Content-aware Modulation for Neural Video Delivery.
Proceedings of the 33rd Workshop on Network and Operating System Support for Digital Audio and Video, 2023

PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffuseIR: Diffusion Models for Isotropic Reconstruction of 3D Microscopic Images.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

Electroencephalogram-Based Driver Emotional State Detection with Manifold Learning.
Proceedings of the 26th IEEE International Conference on Intelligent Transportation Systems, 2023

A Text Prompt-Based Approach for Zero-Shot Corner Case Object Detection in Autonomous Driving.
Proceedings of the 26th IEEE International Conference on Intelligent Transportation Systems, 2023

Uncertainty-Aware Dynamic Learning for Cross-Domain Few-Shot Scene Classification from Remote Sensing Imagery.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023

Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks.
Proceedings of the International Conference on Machine Learning, 2023

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Q-Diffusion: Quantizing Diffusion Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

BadRes: Reveal the Backdoors Through Residual Connection.
Proceedings of the IEEE International Conference on Acoustics, 2023

CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Improving Generalization of Meta-Learning with Inverted Regularization at Inner-Level.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Annealing-based Label-Transfer Learning for Open World Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Open-Vocabulary Point-Cloud Object Detection without 3D Annotation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-World.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
A Review of Single-Source Deep Unsupervised Visual Domain Adaptation.
IEEE Trans. Neural Networks Learn. Syst., 2022

Active Gradual Domain Adaptation: Dataset and Approach.
IEEE Trans. Multim., 2022

BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection.
CoRR, 2022

Multi-latent Space Alignments for Unsupervised Domain Adaptation in Multi-view 3D Object Detection.
CoRR, 2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning.
CoRR, 2022

Uncertainty Guided Depth Fusion for Spike Camera.
CoRR, 2022

Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer.
CoRR, 2022

Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning.
CoRR, 2022

Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data.
CoRR, 2022

UnrealNAS: Can We Search Neural Architectures with Unreal Data?
CoRR, 2022

Cross-Domain Object Detection with Mean-Teacher Transformer.
CoRR, 2022

Self-Supervised Pretraining Improves Self-Supervised Pretraining.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Jump Self-attention: Capturing High-order Statistics in Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Prototype-Voxel Contrastive Learning for LiDAR Point Cloud Panoptic Segmentation.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

DNA: Domain Generalization with Diversified Neural Averaging.
Proceedings of the International Conference on Machine Learning, 2022

Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting.
Proceedings of the Tenth International Conference on Learning Representations, 2022

MTTrans: Cross-domain Object Detection with Mean Teacher Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

Efficient Meta-Tuning for Content-Aware Neural Video Delivery.
Proceedings of the Computer Vision - ECCV 2022, 2022

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Online Continual Adaptation with Active Self-Training.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
Learning graph attention-aware knowledge graph embedding.
Neurocomputing, 2021

2nd Place Solution for VisDA 2021 Challenge - Universally Domain Adaptive Image Recognition.
CoRR, 2021

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.
CoRR, 2021

Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Revisiting Mid-Level Patterns for Cross-Domain Few-Shot Recognition.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Annotation-Efficient Untrimmed Video Action Recognition.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Triplet Attention: Rethinking the Similarity in Transformers.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Decoupling Global and Local Representations via Invertible Generative Flows.
Proceedings of the 9th International Conference on Learning Representations, 2021

MERITS: Medication Recommendation for Chronic Disease with Irregular Time-Series.
Proceedings of the IEEE International Conference on Data Mining, 2021

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Contrastive Multimodal Fusion with TupleInfoNCE.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Cross-Domain Sentiment Classification with Contrastive Learning and Mutual Information Maximization.
Proceedings of the IEEE International Conference on Acoustics, 2021

Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Modeling relation paths for knowledge base completion via joint adversarial training.
Knowl. Based Syst., 2020

P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding.
CoRR, 2020

Cross-Domain Sentiment Classification with In-Domain Contrastive Learning.
CoRR, 2020

Revisiting Mid-Level Patterns for Distant-Domain Few-Shot Recognition.
CoRR, 2020

Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms.
CoRR, 2020

Rethinking Distributional Matching Based Domain Adaptation.
CoRR, 2020

Decoupling Global and Local Representations from/for Image Generation.
CoRR, 2020

Compositional Few-Shot Recognition with Primitive Discovery and Enhancing.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Generalized Zero-Shot Text Classification for ICD Coding.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Instance Adaptive Self-training for Unsupervised Domain Adaptation.
Proceedings of the Computer Vision - ECCV 2020, 2020

TCGM: An Information-Theoretic Framework for Semi-supervised Multi-modality Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

Multi-Source Distilling Domain Adaptation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Generalized Zero-shot ICD Coding.
CoRR, 2019

Feature Fusion for Image Retrieval With Adaptive Bitrate Allocation and Hard Negative Mining.
IEEE Access, 2019

Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

MaCow: Masked Convolutional Generative Flow.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018
Deep Understanding of Urban Mobility from CityscapeWebcams.
PhD thesis, 2018

Hierarchical Attention Networks for Knowledge Base Completion via Joint Adversarial Training.
CoRR, 2018

Adversarial Multiple Source Domain Adaptation.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Multiple Source Domain Adaptation with Adversarial Learning.
Proceedings of the 6th International Conference on Learning Representations, 2018

A Deep Learning Approach to IoT Authentication.
Proceedings of the 2018 IEEE International Conference on Communications, 2018

Learning to Understand Image Blur.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Topology adaptive graph convolutional networks.
CoRR, 2017

Multiple Source Domain Adaptation with Adversarial Training of Neural Networks.
CoRR, 2017

FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Understanding Traffic Density from Large-Scale Web Camera Data.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2015
Traffic flow from a low frame rate city camera.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

2014
Bayesian model fusion: Enabling test cost reduction of analog/RF circuits via wafer-level spatial variation modeling.
Proceedings of the 2014 International Test Conference, 2014

2013
On a Highly Efficient RDO-Based Mode Decision Pipeline Design for AVS.
IEEE Trans. Multim., 2013

A high-throughput low-latency arithmetic encoder design for HDTV.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

2012
An efficient foreground-based surveillance video coding scheme in low bit-rate compression.
Proceedings of the 2012 Visual Communications and Image Processing, 2012

A flexible and high-performance hardware video encoder architecture.
Proceedings of the 2012 Picture Coding Symposium, 2012

An Optimized Hardware Video Encoder for AVS with Level C+ Data Reuse Scheme for Motion Estimation.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012


  Loading...