Junnan Li

Pattern Recognit. Lett., April, 2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM.

[BibT_eX]

[DOI]

Akhilesh Deepak Gotmare

CoRR, 2023

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding.

[BibT_eX]

[DOI]

Roberto Martín-Martín

CoRR, 2023

Efficient Text-to-Code Retrieval with Cascaded Fast and Slow Transformer Models.

[BibT_eX]

[DOI]

Akhilesh Deepak Gotmare

Shafiq Joty

Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.

[BibT_eX]

[DOI]

Wenliang Dai

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Masked Unsupervised Self-training for Label-free Image Classification.

[BibT_eX]

[DOI]

Silvio Savarese

Proceedings of the Eleventh International Conference on Learning Representations, 2023

CodeT5+: Open Code Large Language Models for Code Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models.

[BibT_eX]

[DOI]

Jiaxian Guo

Boyang Li

Dacheng Tao

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LAVIS: A One-stop Library for Language-Vision Intelligence.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

Tackling Data Heterogeneity in Federated Learning with Class Prototypes.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models.

[BibT_eX]

[DOI]

Jiaxian Guo

Boyang Li

Dacheng Tao

CoRR, 2022

BotSIM: An End-to-End Bot Simulation Toolkit for Commercial Task-Oriented Dialog Systems.

[BibT_eX]

[DOI]

CoRR, 2022

LAVIS: A Library for Language-Vision Intelligence.

[BibT_eX]

[DOI]

CoRR, 2022

Masked Unsupervised Self-training for Zero-shot Image Classification.

[BibT_eX]

[DOI]

Silvio Savarese

CoRR, 2022

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Align and Prompt: Video-and-Language Pre-training with Entity Prompts.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes.

[BibT_eX]

[DOI]

CoRR, 2021

Cascaded Fast and Slow Models for Efficient Semantic Code Search.

[BibT_eX]

[DOI]

Akhilesh Deepak Gotmare

Shafiq R. Joty

CoRR, 2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation.

[BibT_eX]

[DOI]

Ramprasaath R. Selvaraju

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Prototypical Contrastive Learning of Unsupervised Representations.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

MoPro: Webly Supervised Learning with Momentum Prototypes.

[BibT_eX]

[DOI]

Caiming Xiong

Proceedings of the 9th International Conference on Learning Representations, 2021

Learning from Noisy Data with Robust Representation Learning.

[BibT_eX]

[DOI]

Caiming Xiong

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization.

[BibT_eX]

[DOI]

Caiming Xiong

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Interact as You Intend: Intention-Driven Human-Object Interaction Detection.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

Video Storytelling: Textual Summaries for Events.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

Visual Social Relationship Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2020

Prototypical Contrastive Learning of Unsupervised Representations.

[BibT_eX]

[DOI]

CoRR, 2020

Improving out-of-distribution generalization via multi-task self-supervised pretraining.

[BibT_eX]

[DOI]

Isabela Albuquerque

Nikhil Naik

Nitish Shirish Keskar

Richard Socher

CoRR, 2020

Towards Noise-resistant Object Detection with Noisy Annotations.

[BibT_eX]

[DOI]

CoRR, 2020

GradMix: Multi-source Transfer across Domains and Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Weakly-Supervised Multi-Person Action Recognition in 360° Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

DivideMix: Learning with Noisy Labels as Semi-supervised Learning.

[BibT_eX]

[DOI]

Richard Socher

Proceedings of the 8th International Conference on Learning Representations, 2020

Learning on the Fly: An RNN-Based Online Throughput Prediction Framework for UAV Communications.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Communications Workshops, 2020

The Devil Is in Classification: A Simple Framework for Long-Tail Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

A Multi-sensor Framework for Personal Presentation Analytics.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2019

Deep Reinforcement Learning in Soft Viscoelastic Actuator of Dielectric Elastomer.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2019

LSTM-based multi-label video event detection.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2019

Classification Calibration for Long-tail Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2019

Self-supervised Representation Learning Using 360° Data.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Learning to Detect Human-Object Interactions With Knowledge.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Learning to Learn From Noisy Labeled Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Video Storytelling.

[BibT_eX]

[DOI]

CoRR, 2018

Unsupervised Learning of View-invariant Action Representations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2017

Attention Transfer from Web Images for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

Dual-Glance Model for Deciphering Social Relationships.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Demo Paper: PreSense - An Assistive Presentation Self-Quantification System.

[BibT_eX]

[DOI]

Yongkang Wong

Mohan S. Kankanhalli

Proceedings of the IEEE International Symposium on Multimedia, 2016

Multi-stream Deep Learning Framework for Automated Presentation Assessment.

[BibT_eX]

[DOI]