Shanghang Zhang
Orcid: 0000-0003-4047-3526
According to our database1,
Shanghang Zhang
authored at least 258 papers
between 2012 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
Inf. Fusion, 2026
2025
RepCaM++: Exploring Transparent Visual Prompt With Inference-Time Re-Parameterization for Neural Video Delivery.
IEEE Trans. Mob. Comput., September, 2025
EEG-Driven Classification of Driver Mental Workload in Diverse Environments: A Dual-Branch Network for Efficient In-Vehicle Applications.
IEEE Internet Things J., September, 2025
NavA<sup>3</sup>: Understanding Any Instruction, Navigating Anywhere, Finding Anything.
CoRR, August, 2025
UniEdit-I: Training-free Image Editing for Unified VLM via Iterative Understanding, Editing and Verifying.
CoRR, August, 2025
FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning.
CoRR, July, 2025
Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition.
CoRR, July, 2025
RwoR: Generating Robot Demonstrations from Human Hand Collection for Policy Learning without Robot.
CoRR, July, 2025
CoRR, July, 2025
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents.
CoRR, June, 2025
CoRR, June, 2025
CoRR, June, 2025
CoRR, June, 2025
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
CoRR, June, 2025
Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought.
CoRR, June, 2025
SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game.
CoRR, June, 2025
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics.
CoRR, June, 2025
Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning.
CoRR, June, 2025
BEVUDA++: Geometric-Aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection.
IEEE Trans. Circuits Syst. Video Technol., May, 2025
CoRR, May, 2025
AFCL: Analytic Federated Continual Learning for Spatio-Temporal Invariance of Non-IID Data.
CoRR, May, 2025
ACU: Analytic Continual Unlearning for Efficient and Exact Forgetting with Privacy Preservation.
CoRR, May, 2025
CoRR, May, 2025
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration.
CoRR, May, 2025
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation.
CoRR, May, 2025
Co<sup>3</sup>Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion.
CoRR, May, 2025
ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance.
CoRR, April, 2025
EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler.
CoRR, April, 2025
MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation.
CoRR, March, 2025
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model.
CoRR, March, 2025
AffordGrasp: In-Context Affordance Reasoning for Open-Vocabulary Task-Oriented Grasping in Clutter.
CoRR, March, 2025
Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network With Graph Representation Learning.
IEEE Trans. Neural Networks Learn. Syst., February, 2025
CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World.
CoRR, February, 2025
SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation.
CoRR, January, 2025
CoRR, January, 2025
Empowering Corner Case Detection in Autonomous Vehicles With Multimodal Large Language Models.
IEEE Signal Process. Lett., 2025
A diffusion-based feature enhancement approach for driving behavior classification with EEG data.
Adv. Eng. Informatics, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Co3Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Proceedings of the Data Compression Conference, 2025
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
DesignEdit: Unify Spatial-Aware Image Editing via Training-free Inpainting with a Multi-Layered Latent Diffusion Framework.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments.
IEEE Robotics Autom. Lett., September, 2024
IEEE J. Biomed. Health Informatics, July, 2024
BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for Multi-View BEV 3D Object Detection.
IEEE Trans. Intell. Veh., January, 2024
DECOR: Dynamic Decoupling and Multiobjective Optimization for Long-Tailed Remote Sensing Image Classification.
IEEE Trans. Geosci. Remote. Sens., 2024
A lightweight multi-layer perceptron for efficient multivariate time series forecasting.
Knowl. Based Syst., 2024
The Emerging Issues in Bioimaging AI Publications and Research (Dagstuhl Seminar 24042).
Dagstuhl Reports, 2024
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation.
CoRR, 2024
ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance.
CoRR, 2024
CoRR, 2024
[CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
CoRR, 2024
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation.
CoRR, 2024
Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective.
CoRR, 2024
Learning from Different Samples: A Source-free Framework for Semi-supervised Domain Adaptation.
CoRR, 2024
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective.
CoRR, 2024
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference.
CoRR, 2024
Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection.
CoRR, 2024
DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing.
CoRR, 2024
A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge - Multi-Task Robustness Track.
CoRR, 2024
CoRR, 2024
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis.
CoRR, 2024
VeCAF: VLM-empowered Collaborative Active Finetuning with Training Objective Awareness.
CoRR, 2024
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024
RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the IEEE International Conference on Robotics and Automation, 2024
Proceedings of the IEEE International Conference on Robotics and Automation, 2024
Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024
BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
VLUReID: Exploiting Vision-Language Knowledge for Unsupervised Person Re-Identification.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-Speech Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
IEEE Trans. Circuits Syst. Video Technol., September, 2023
IEEE Trans. Cogn. Dev. Syst., September, 2023
Artif. Intell., May, 2023
P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification.
Remote. Sens., April, 2023
IEEE Trans. Multim., 2023
Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation.
CoRR, 2023
FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection.
CoRR, 2023
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding.
CoRR, 2023
CoRR, 2023
Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior.
CoRR, 2023
CoRR, 2023
ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric and Semantic Rendering.
CoRR, 2023
CoRR, 2023
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
CoRR, 2023
Proceedings of the 24th International Workshop on Mobile Computing Systems and Applications, 2023
Proceedings of the 33rd Workshop on Network and Operating System Support for Digital Audio and Video, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023
Proceedings of the 26th IEEE International Conference on Intelligent Transportation Systems, 2023
A Text Prompt-Based Approach for Zero-Shot Corner Case Object Detection in Autonomous Driving.
Proceedings of the 26th IEEE International Conference on Intelligent Transportation Systems, 2023
Uncertainty-Aware Dynamic Learning for Cross-Domain Few-Shot Scene Classification from Remote Sensing Imagery.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023
Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks.
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Improving Generalization of Meta-Learning with Inverted Regularization at Inner-Level.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-World.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
IEEE Trans. Neural Networks Learn. Syst., 2022
BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection.
CoRR, 2022
Multi-latent Space Alignments for Unsupervised Domain Adaptation in Multi-view 3D Object Detection.
CoRR, 2022
Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer.
CoRR, 2022
Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning.
CoRR, 2022
CoRR, 2022
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022
Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022
Proceedings of the 2022 International Conference on Robotics and Automation, 2022
Proceedings of the International Conference on Machine Learning, 2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022
2021
2nd Place Solution for VisDA 2021 Challenge - Universally Domain Adaptive Image Recognition.
CoRR, 2021
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.
CoRR, 2021
Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021
Proceedings of the 9th International Conference on Learning Representations, 2021
Proceedings of the IEEE International Conference on Data Mining, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Cross-Domain Sentiment Classification with Contrastive Learning and Mutual Information Maximization.
Proceedings of the IEEE International Conference on Acoustics, 2021
Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Modeling relation paths for knowledge base completion via joint adversarial training.
Knowl. Based Syst., 2020
P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding.
CoRR, 2020
CoRR, 2020
CoRR, 2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020
Proceedings of the Computer Vision - ECCV 2020, 2020
TCGM: An Information-Theoretic Framework for Semi-supervised Multi-modality Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Feature Fusion for Image Retrieval With Adaptive Bitrate Allocation and Hard Negative Mining.
IEEE Access, 2019
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
2018
Hierarchical Attention Networks for Knowledge Base Completion via Joint Adversarial Training.
CoRR, 2018
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Proceedings of the 6th International Conference on Learning Representations, 2018
Proceedings of the 2018 IEEE International Conference on Communications, 2018
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
2017
CoRR, 2017
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras.
Proceedings of the IEEE International Conference on Computer Vision, 2017
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
2015
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015
2014
Bayesian model fusion: Enabling test cost reduction of analog/RF circuits via wafer-level spatial variation modeling.
Proceedings of the 2014 International Test Conference, 2014
2013
IEEE Trans. Multim., 2013
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
2012
An efficient foreground-based surveillance video coding scheme in low bit-rate compression.
Proceedings of the 2012 Visual Communications and Image Processing, 2012
Proceedings of the 2012 Picture Coding Symposium, 2012
An Optimized Hardware Video Encoder for AVS with Level C+ Data Reuse Scheme for Motion Estimation.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012