We stand with Ukraine

We stand with Ukraine

Yushi Hu

Orcid: 0000-0002-7540-2413

According to our database¹, Yushi Hu authored at least 28 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2026

Unified Text-Image Generation with Weakness-Targeted Post-Training.

[DOI]

,

Philippe Hansen-Estruch

,

,

,

,

,

Michal Drozdzal

,

Reyhane Askari Hemmat

,

Luke Zettlemoyer

,

Marjan Ghazvininejad

CoRR, January, 2026

2025

Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image.

[DOI]

,

Reyhane Askari Hemmat

,

,

,

Luke Zettlemoyer

,

Marjan Ghazvininejad

CoRR, December, 2025

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation.

[DOI]

,

,

,

Luke Zettlemoyer

,

,

Marjan Ghazvininejad

CoRR, December, 2025

Self-Improving VLM Judges Without Human Annotations.

[DOI]

Inna Wanyin Lin

,

,

Shuyue Stella Li

,

,

,

Luke Zettlemoyer

,

,

Marjan Ghazvininejad

CoRR, December, 2025

TV2TV: A Unified Framework for Interleaved Language and Video Generation.

[DOI]

,

,

,

,

,

,

,

,

Michal Drozdzal

,

,

,

,

Sreya Dutta Roy

,

,

,

Marjan Ghazvininejad

,

Luke Zettlemoyer

,

CoRR, December, 2025

MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset.

[DOI]

,

,

,

,

,

,

,

,

,

Silvio Savarese

,

,

,

CoRR, May, 2025

Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Artsiom Sanakoyeu

,

Felix Juefei-Xu

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback.

[DOI]

,

,

,

,

,

,

,

Charles Herrmann

,

Sjoerd van Steenkiste

,

,

Cyrus Rashtchian

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation.

[DOI]

,

,

,

Aniruddha Kembhavi

,

William T. Freeman

,

,

,

Antonio Torralba

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Decoding-Time Language Model Alignment with Multiple Objectives.

[DOI]

,

,

,

,

Hanna Hajishirzi

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models.

[DOI]

,

,

,

,

,

Luke Zettlemoyer

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation.

[DOI]

,

,

Jason M. Baldridge

,

,

,

,

,

Jordi Pont-Tuset

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

BLINK: Multimodal Large Language Models Can See but Not Perceive.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models.

[DOI]

,

,

,

Krishnamurthy Viswanathan

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation.

[DOI]

Katherine M. Collins

,

,

,

,

Shayegan Omidshafiei

,

,

,

,

,

,

,

,

,

,

,

,

Deepak Ramachandran

,

Krishnamurthy Dj Dvijotham

Proceedings of the Seventh AAAI/ACM Conference on AI, Ethics, and Society (AIES-24) - Full Archival Papers, October 21-23, 2024, San Jose, California, USA, 2024

Training Language Models to Generate Text with Citations via Fine-grained Rewards.

[DOI]

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training.

[DOI]

,

,

,

,

,

Prithviraj Ammanabrolu

,

,

,

Hannaneh Hajishirzi

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Binding Language Models in Symbolic Languages.

[DOI]

,

,

,

,

,

,

,

,

,

Luke Zettlemoyer

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

One Embedder, Any Task: Instruction-Finetuned Text Embeddings.

[DOI]

,

,

,

,

,

,

,

,

Luke Zettlemoyer

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

PromptCap: Prompt-Guided Task-Aware Image Captioning.

[DOI]

,

,

,

,

,

CoRR, 2022

Unsupervised Learning of Hierarchical Conversation Structure.

[DOI]

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

In-Context Learning for Few-Shot Dialogue State Tracking.

[DOI]

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021

Acoustic Span Embeddings for Multilingual Query-by-Example Search.

[DOI]

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

2020

Multilingual Jointly Trained Acoustic and Written Word Embeddings.

[DOI]

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Loading...