Yu Qiao

Affiliations:
  • Shanghai AI Laboratory, China


According to our database1, Yu Qiao authored at least 77 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Latte: Latent Diffusion Transformer for Video Generation.
CoRR, 2024

2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
CoRR, 2023

Towards Knowledge-driven Autonomous Driving.
CoRR, 2023

Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation.
CoRR, 2023

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving.
CoRR, 2023

Asymmetric Masked Distillation for Pre-Training Small Foundation Models.
CoRR, 2023

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models.
CoRR, 2023

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models.
CoRR, 2023

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding.
CoRR, 2023

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving.
CoRR, 2023

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation.
CoRR, 2023

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior.
CoRR, 2023

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.
CoRR, 2023

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.
CoRR, 2023

Scaling Data Generation in Vision-and-Language Navigation.
CoRR, 2023

Meta-Transformer: A Unified Framework for Multimodal Learning.
CoRR, 2023

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models.
CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
CoRR, 2023

Align, Adapt and Inject: Sound-guided Unified Image Generation.
CoRR, 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.
CoRR, 2023

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds.
CoRR, 2023

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling.
CoRR, 2023

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset.
CoRR, 2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers.
CoRR, 2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.
CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
CoRR, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.
CoRR, 2023

VideoChat: Chat-Centric Video Understanding.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.
CoRR, 2023

Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving.
CoRR, 2023

Topology Reasoning for Driving Scenes.
CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.
CoRR, 2023

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion.
CoRR, 2023

Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling.
CoRR, 2023

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Long-Term Rhythmic Video Soundtracker.
Proceedings of the International Conference on Machine Learning, 2023

Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Vision Transformer Adapter for Dense Predictions.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Stare at What You See: Masked Image Modeling without Reconstruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Siamese Image Modeling for Self-Supervised Vision Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
ADAS: A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic Segmentation.
CoRR, 2022

Goal-oriented Autonomous Driving.
CoRR, 2022

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.
CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.
CoRR, 2022

Demystify Transformers & Convolutions in Modern Image Deep Networks.
CoRR, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.
CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.
CoRR, 2022

Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot.
CoRR, 2022

Siamese Image Modeling for Self-Supervised Vision Representation Learning.
CoRR, 2022

ConvMAE: Masked Convolution Meets Masked Autoencoders.
CoRR, 2022

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MCMAE: Masked Convolution Meets Masked Autoencoders.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.
Proceedings of the Computer Vision - ECCV 2022, 2022

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022

Frozen CLIP Models are Efficient Video Learners.
Proceedings of the Computer Vision - ECCV 2022, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark.
Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach.
Proceedings of the Conference on Robot Learning, 2022

2021
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.
CoRR, 2021

INTERN: A New Learning Paradigm Towards General Vision.
CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.
CoRR, 2021

Scalable Transformers for Neural Machine Translation.
CoRR, 2021

2019
AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

P2SGrad: Refined Gradients for Optimizing Deep Face Models.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2016
Bridging Music and Image via Cross-Modal Ranking Analysis.
IEEE Trans. Multim., 2016

2012
Cross matching of music and image.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012


  Loading...