Yushi Hu

Orcid: 0000-0002-7540-2413

According to our database1, Yushi Hu authored at least 28 papers between 2020 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning.
CoRR, March, 2026

Unified Text-Image Generation with Weakness-Targeted Post-Training.
CoRR, January, 2026

2025
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image.
CoRR, December, 2025

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation.
CoRR, December, 2025

Self-Improving VLM Judges Without Human Annotations.
CoRR, December, 2025

TV2TV: A Unified Framework for Interleaved Language and Video Generation.
CoRR, December, 2025

MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation.
CoRR, May, 2025

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset.
CoRR, May, 2025

Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models.
CoRR, April, 2025

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Decoding-Time Language Model Alignment with Multiple Objectives.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

BLINK: Multimodal Large Language Models Can See but Not Perceive.
Proceedings of the Computer Vision - ECCV 2024, 2024

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation.
Proceedings of the Seventh AAAI/ACM Conference on AI, Ethics, and Society (AIES-24) - Full Archival Papers, October 21-23, 2024, San Jose, California, USA, 2024

Training Language Models to Generate Text with Citations via Fine-grained Rewards.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Binding Language Models in Symbolic Languages.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

One Embedder, Any Task: Instruction-Finetuned Text Embeddings.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
PromptCap: Prompt-Guided Task-Aware Image Captioning.
CoRR, 2022

Unsupervised Learning of Hierarchical Conversation Structure.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

In-Context Learning for Few-Shot Dialogue State Tracking.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021
Acoustic Span Embeddings for Multilingual Query-by-Example Search.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

2020
Multilingual Jointly Trained Acoustic and Written Word Embeddings.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020


  Loading...