Jing Bi

Orcid: 0009-0006-8235-2158

Affiliations:

Univeristy of Rochester, Rochester, NY, USA

According to our database¹, Jing Bi authored at least 32 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

TDMM-LM: Bridging Facial Understanding and Animation via Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2026

Video Understanding With Large Language Models: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., February, 2026

Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?

[BibT_eX]

[DOI]

CoRR, February, 2026

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

VisualActBench: Can VLMs See and Act like a Human?

[BibT_eX]

[DOI]

Christopher G. Brinton

Ehsan Hoque

Jiebo Luo

CoRR, December, 2025

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination.

[BibT_eX]

[DOI]

CoRR, November, 2025

When to Think and When to Look: Uncertainty-Guided Lookback.

[BibT_eX]

[DOI]

CoRR, November, 2025

Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward.

[BibT_eX]

[DOI]

CoRR, October, 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

What to Do Next? Memorizing skills from Egocentric Instructional Video.

[BibT_eX]

[DOI]

Jing Bi

Chenliang Xu

CoRR, July, 2025

ACTLLM: Action Consistency Tuned Large Language Model.

[BibT_eX]

[DOI]

CoRR, June, 2025

Can Sound Replace Vision in LLaVA With Token Substitution?

[BibT_eX]

[DOI]

CoRR, June, 2025

I<sup>2</sup>G: Generating Instructional Illustrations via Text-Conditioned Diffusion.

[BibT_eX]

[DOI]

CoRR, May, 2025

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1).

[BibT_eX]

[DOI]

CoRR, April, 2025

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity.

[BibT_eX]

[DOI]

CoRR, March, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ZeroSep: Separate Anything in Audio with Zero Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Generative AI for Cel-Animation: A Survey.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.

[BibT_eX]

[DOI]

CoRR, 2024

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.

[BibT_eX]

[DOI]

CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

EAGLE: Egocentric AGgregated Language-video Engine.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

2023

Video Understanding with Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

MISAR: A Multimodal Instructional System with Augmented Reality.

[BibT_eX]

[DOI]

CoRR, 2023

2021

Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning.

[BibT_eX]

[DOI]

Jing Bi

Jiebo Luo

Chenliang Xu

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences.

[BibT_eX]

[DOI]

CoRR, 2020

Learning from Interventions Using Hierarchical Policies for Safe Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2018

Navigation by Imitation in a Pedestrian-Rich Environment.

[BibT_eX]

[DOI]

CoRR, 2018

Jing Bi

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...