Peng Gao

Affiliations:
  • Shanghai Artificial Intelligence Laboratory, China
  • Chinese University of Hong Kong, Hong Kong (PhD 2021)


According to our database1, Peng Gao authored at least 77 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
CLIP-Adapter: Better Vision-Language Models with Feature Adapters.
Int. J. Comput. Vis., February, 2024

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models.
CoRR, 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Hybrid token transformer for deep face recognition.
Pattern Recognit., July, 2023

P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification.
Remote. Sens., April, 2023

Object-Centric Masked Image Modeling-Based Self-Supervised Pretraining for Remote Sensing Object Detection.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2023

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.
CoRR, 2023

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.
CoRR, 2023

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks.
CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.
CoRR, 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.
CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
CoRR, 2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model.
CoRR, 2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification.
CoRR, 2023

Personalize Segment Anything Model with One Shot.
CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.
CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.
CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
CoRR, 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.
CoRR, 2023

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners.
CoRR, 2023

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation.
CoRR, 2023

Hybrid Transformer Network for Change Detection Under Self-Supervised Pretraining.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Stare at What You See: Masked Image Modeling without Reconstruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Q-DETR: An Efficient Low-Bit Quantized Detection Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Resilient Binary Neural Network.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain.
Remote. Sens., 2022

Hierarchical Disentangling Network for Building Extraction from Very High Resolution Optical Remote Sensing Imagery.
Remote. Sens., 2022

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders.
CoRR, 2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning.
CoRR, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.
CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.
CoRR, 2022

Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain.
CoRR, 2022

Illumination Adaptive Transformer.
CoRR, 2022

ConvMAE: Masked Convolution Meets Masked Autoencoders.
CoRR, 2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training.
CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.
CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.
CoRR, 2022

TerViT: An Efficient Ternary Vision Transformer.
CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.
CoRR, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MCMAE: Masked Convolution Meets Masked Autoencoders.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Adaptive Local Context Embedding for Small Vehicle Detection from Aerial Optical Remote Sensing Images.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.
Proceedings of the Computer Vision - ECCV 2022, 2022

IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors.
Proceedings of the Computer Vision - ECCV 2022, 2022

Recurrent Bilinear Optimization for Binary Neural Networks.
Proceedings of the Computer Vision, 2022

Frozen CLIP Models are Efficient Video Learners.
Proceedings of the Computer Vision - ECCV 2022, 2022

Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
A Simple Long-Tailed Recognition Baseline via Vision-Language Model.
CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.
CoRR, 2021

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.
CoRR, 2021

Oriented Object Detection with Transformer.
CoRR, 2021

Scalable Transformers for Neural Machine Translation.
CoRR, 2021

Container: Context Aggregation Network.
CoRR, 2021

Dual-stream Network for Visual Recognition.
CoRR, 2021

RomeBERT: Robust Training of Multi-Exit BERT.
CoRR, 2021

Dual-stream Network for Visual Recognition.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Container: Context Aggregation Networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dense Contrastive Visual-Linguistic Pretraining.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Fast Convergence of DETR with Spatially Modulated Co-Attention.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
End-to-End Object Detection with Adaptive Clustering Transformer.
CoRR, 2020

Contrastive Visual-Linguistic Pretraining.
CoRR, 2020

Gradient Regularized Contrastive Learning for Continual Domain Adaptation.
CoRR, 2020

Character Matters: Video Story Understanding with Character-Aware Relations.
CoRR, 2020

Learning Where to Focus for Efficient Video Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Multi-Modality Latent Interaction Network for Visual Question Answering.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Video Object Detection with Locally-Weighted Deformable Neighbors.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Question-Guided Hybrid Convolution for Visual Question Answering.
Proceedings of the Computer Vision - ECCV 2018, 2018


  Loading...