Yousong Zhu

Orcid: 0000-0001-8544-410X

According to our database¹, Yousong Zhu authored at least 42 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding.

[BibT_eX]

[DOI]

CoRR, February, 2026

Seg-LLaVA: Empowering pixel-level understanding with large vision language model.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation.

[BibT_eX]

[DOI]

CoRR, June, 2025

VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?

[BibT_eX]

[DOI]

CoRR, June, 2025

GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking.

[BibT_eX]

[DOI]

CoRR, June, 2025

Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2024

Efficient Masked Autoencoders With Self-Consistency.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Multi-Model Style-Aware Diffusion Learning for Semantic Image Synthesis.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., November, 2024

Relation-Associated Instructions & Hallucination Benchmark.

[BibT_eX]

[DOI]

Dataset, July, 2024

Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring.

[BibT_eX]

[DOI]

CoRR, 2024

The Devil is in Details: Delving Into Lite FFN Design for Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Griffon: Spelling Out All Object Locations at Any Granularity with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Self-Supervised Representation Learning from Arbitrary Scenarios.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Mitigating Hallucination in Visual Language Models with Visual Supervision.

[BibT_eX]

[DOI]

CoRR, 2023

Efficient Masked Autoencoders with Self-Consistency.

[BibT_eX]

[DOI]

CoRR, 2023

Exploring Stochastic Autoregressive Image Modeling for Visual Representation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval.

[BibT_eX]

[DOI]

CoRR, 2022

Part-Aware Self-Supervised Pre-Training for Person Re-Identification.

[BibT_eX]

[DOI]

CoRR, 2022

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

PASS: Part-Aware Self-Supervised Pre-Training for Person Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Cross-Dataset Collaborative Learning for Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2021

MST: Masked Self-Supervised Transformer for Visual Representation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

DPT: Deformable Patch-based Transformer for Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Attention-Guided Knowledge Distillation for Efficient Single-Stage Detector.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

Adaptive Class Suppression Loss for Long-Tail Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Food det: Detecting foods in refrigerator with supervised transformer network.

[BibT_eX]

[DOI]

Neurocomputing, 2020

A novel data augmentation scheme for pedestrian detection with attribute preserving GAN.

[BibT_eX]

[DOI]

Neurocomputing, 2020

Large Batch Optimization for Object Detection: Training COCO in 12 minutes.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Dual Super-Resolution Learning for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Attention CoupleNet: Fully Convolutional Attention Coupling Network for Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

Elite Loss for scene text detection.

[BibT_eX]

[DOI]

Neurocomputing, 2019

Mask Guided Knowledge Distillation for Single Shot Detector.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

2018

Improved Single Shot Object Detector Using Enhanced Features and Predicting Heads.

[BibT_eX]

[DOI]

Proceedings of the Fourth IEEE International Conference on Multimedia Big Data, 2018

2017

CoupleNet: Coupling Global Structure with Local Parts for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Scale-Adaptive Deconvolutional Regression Network for Pedestrian Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2016, 2016

Yousong Zhu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...