Jing Bi

Orcid: 0009-0006-8235-2158

Affiliations:
  • Univeristy of Rochester, Rochester, NY, USA


According to our database1, Jing Bi authored at least 32 papers between 2018 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
TDMM-LM: Bridging Facial Understanding and Animation via Language Models.
CoRR, March, 2026

Video Understanding With Large Language Models: A Survey.
IEEE Trans. Circuits Syst. Video Technol., February, 2026

Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?
CoRR, February, 2026

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
VisualActBench: Can VLMs See and Act like a Human?
CoRR, December, 2025

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination.
CoRR, November, 2025

When to Think and When to Look: Uncertainty-Guided Lookback.
CoRR, November, 2025

Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward.
CoRR, October, 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models.
CoRR, October, 2025

What to Do Next? Memorizing skills from Egocentric Instructional Video.
CoRR, July, 2025

ACTLLM: Action Consistency Tuned Large Language Model.
CoRR, June, 2025

Can Sound Replace Vision in LLaVA With Token Substitution?
CoRR, June, 2025

I<sup>2</sup>G: Generating Instructional Illustrations via Text-Conditioned Diffusion.
CoRR, May, 2025

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1).
CoRR, April, 2025

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity.
CoRR, March, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ZeroSep: Separate Anything in Audio with Zero Training.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Generative AI for Cel-Animation: A Survey.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.
CoRR, 2024

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.
CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.
CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

EAGLE: Egocentric AGgregated Language-video Engine.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

2023
Video Understanding with Large Language Models: A Survey.
CoRR, 2023

MISAR: A Multimodal Instructional System with Augmented Reality.
CoRR, 2023

2021
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences.
CoRR, 2020

Learning from Interventions Using Hierarchical Policies for Safe Learning.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2018
Navigation by Imitation in a Pedestrian-Rich Environment.
CoRR, 2018


  Loading...