Longteng Guo

Orcid: 0000-0002-4340-4000

According to our database1, Longteng Guo authored at least 26 papers between 2017 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
VL-Mamba: Exploring State Space Models for Multimodal Learning.
CoRR, 2024

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models.
CoRR, 2024

Knowledge Condensation and Reasoning for Knowledge-based VQA.
CoRR, 2024

EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation.
CoRR, 2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst.
CoRR, 2023

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset.
CoRR, 2023

MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

CSDNet: Contrastive Similarity Distillation Network for Multi-lingual Image-Text Retrieval.
Proceedings of the Image and Graphics - 12th International Conference, 2023

2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning.
CoRR, 2022

2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation.
CoRR, 2021

CPTR: Full Transformer Network for Image Captioning.
CoRR, 2021

Fast Sequence Generation with Multi-Agent Reinforcement Learning.
CoRR, 2021

MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Keypoint Context Aggregation for Human Pose Estimation.
Proceedings of the Image and Graphics - 11th International Conference, 2021

Multi-scale Landmark Localization Network for 3D Facial Point Clouds.
Proceedings of the ICDSP 2021: 5th International Conference on Digital Signal Processing, 2021

2020
Show, Tell, and Polish: Ruminant Decoding for Image Captioning.
IEEE Trans. Multim., 2020

AutoCaption: Image Captioning with Neural Architecture Search.
CoRR, 2020

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Modeling Local and Global Contexts for Image Captioning.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2020

Normalized and Geometry-Aware Self-Attention Network for Image Captioning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Multi-View Features and Hybrid Reward Strategies for Vatex Video Captioning Challenge 2019.
CoRR, 2019

Aligning Linguistic Words and Visual Semantic Units for Image Captioning.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

MSCap: Multi-Style Image Captioning With Unpaired Stylized Text.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2017
Sketch-based Image Retrieval using Generative Adversarial Networks.
Proceedings of the 2017 ACM on Multimedia Conference, 2017


  Loading...