Peng Gao

Affiliations:

Shanghai Artificial Intelligence Laboratory, China
Chinese University of Hong Kong, Hong Kong (PhD 2021)

According to our database¹, Peng Gao authored at least 77 papers between 2018 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., February, 2024

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Hybrid token transformer for deep face recognition.

[BibT_eX]

[DOI]

Pattern Recognit., July, 2023

P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification.

[BibT_eX]

[DOI]

Remote. Sens., April, 2023

Object-Centric Masked Image Modeling-Based Self-Supervised Pretraining for Remote Sensing Object Detection.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2023

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.

[BibT_eX]

[DOI]

CoRR, 2023

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks.

[BibT_eX]

[DOI]

CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.

[BibT_eX]

[DOI]

CoRR, 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification.

[BibT_eX]

[DOI]

CoRR, 2023

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.

[BibT_eX]

[DOI]

CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

CoRR, 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.

[BibT_eX]

[DOI]

CoRR, 2023

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners.

[BibT_eX]

[DOI]

CoRR, 2023

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Hybrid Transformer Network for Change Detection Under Self-Supervised Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Stare at What You See: Masked Image Modeling without Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Q-DETR: An Efficient Low-Bit Quantized Detection Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Resilient Binary Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain.

[BibT_eX]

[DOI]

Remote. Sens., 2022

Hierarchical Disentangling Network for Building Extraction from Very High Resolution Optical Remote Sensing Imagery.

[BibT_eX]

[DOI]

Remote. Sens., 2022

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.

[BibT_eX]

[DOI]

CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.

[BibT_eX]

[DOI]

CoRR, 2022

Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain.

[BibT_eX]

[DOI]

CoRR, 2022

Illumination Adaptive Transformer.

[BibT_eX]

[DOI]

CoRR, 2022

ConvMAE: Masked Convolution Meets Masked Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

TerViT: An Efficient Ternary Vision Transformer.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MCMAE: Masked Convolution Meets Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Adaptive Local Context Embedding for Small Vehicle Detection from Aerial Optical Remote Sensing Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Recurrent Bilinear Optimization for Binary Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2022

Frozen CLIP Models are Efficient Video Learners.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.

[BibT_eX]

[DOI]

CoRR, 2021

Oriented Object Detection with Transformer.

[BibT_eX]

[DOI]

CoRR, 2021

Scalable Transformers for Neural Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2021

Container: Context Aggregation Network.

[BibT_eX]

[DOI]

CoRR, 2021

Dual-stream Network for Visual Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

RomeBERT: Robust Training of Multi-Exit BERT.

[BibT_eX]

[DOI]

CoRR, 2021

Dual-stream Network for Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Container: Context Aggregation Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dense Contrastive Visual-Linguistic Pretraining.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Fast Convergence of DETR with Spatially Modulated Co-Attention.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

End-to-End Object Detection with Adaptive Clustering Transformer.

[BibT_eX]

[DOI]

CoRR, 2020

Contrastive Visual-Linguistic Pretraining.

[BibT_eX]

[DOI]

CoRR, 2020

Gradient Regularized Contrastive Learning for Continual Domain Adaptation.

[BibT_eX]

[DOI]

CoRR, 2020

Character Matters: Video Story Understanding with Character-Aware Relations.

[BibT_eX]

[DOI]

CoRR, 2020

Learning Where to Focus for Efficient Video Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Multi-Modality Latent Interaction Network for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Video Object Detection with Locally-Weighted Deformable Neighbors.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Question-Guided Hybrid Convolution for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Peng Gao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...