Jing Bi

Orcid: 0009-0006-8235-2158

Affiliations:

Univeristy of Rochester, Rochester, NY, USA

According to our database¹, Jing Bi authored at least 25 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward.

[BibT_eX]

[DOI]

CoRR, October, 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

ACTLLM: Action Consistency Tuned Large Language Model.

[BibT_eX]

[DOI]

CoRR, June, 2025

Can Sound Replace Vision in LLaVA With Token Substitution?

[BibT_eX]

[DOI]

CoRR, June, 2025

ZeroSep: Separate Anything in Audio with Zero Training.

[BibT_eX]

[DOI]

CoRR, May, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.

[BibT_eX]

[DOI]

CoRR, May, 2025

I<sup>2</sup>G: Generating Instructional Illustrations via Text-Conditioned Diffusion.

[BibT_eX]

[DOI]

CoRR, May, 2025

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.

[BibT_eX]

[DOI]

CoRR, April, 2025

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1).

[BibT_eX]

[DOI]

CoRR, April, 2025

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity.

[BibT_eX]

[DOI]

CoRR, March, 2025

Generative AI for Cel-Animation: A Survey.

[BibT_eX]

[DOI]

CoRR, January, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.

[BibT_eX]

[DOI]

CoRR, 2024

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.

[BibT_eX]

[DOI]

CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

EAGLE: Egocentric AGgregated Language-video Engine.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

2023

Video Understanding with Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

MISAR: A Multimodal Instructional System with Augmented Reality.

[BibT_eX]

[DOI]

CoRR, 2023

2021

Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning.

[BibT_eX]

[DOI]

Jing Bi

Jiebo Luo

Chenliang Xu

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences.

[BibT_eX]

[DOI]

CoRR, 2020

Learning from Interventions Using Hierarchical Policies for Safe Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2018

Navigation by Imitation in a Pedestrian-Rich Environment.

[BibT_eX]

[DOI]

CoRR, 2018

Jing Bi

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...