Jing Bi

Orcid: 0009-0006-8235-2158

Affiliations:
  • Univeristy of Rochester, Rochester, NY, USA


According to our database1, Jing Bi authored at least 23 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
ACTLLM: Action Consistency Tuned Large Language Model.
CoRR, June, 2025

Can Sound Replace Vision in LLaVA With Token Substitution?
CoRR, June, 2025

ZeroSep: Separate Anything in Audio with Zero Training.
CoRR, May, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.
CoRR, May, 2025

I<sup>2</sup>G: Generating Instructional Illustrations via Text-Conditioned Diffusion.
CoRR, May, 2025

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.
CoRR, April, 2025

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1).
CoRR, April, 2025

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity.
CoRR, March, 2025

Generative AI for Cel-Animation: A Survey.
CoRR, January, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.
CoRR, 2024

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.
CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.
CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

EAGLE: Egocentric AGgregated Language-video Engine.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

2023
Video Understanding with Large Language Models: A Survey.
CoRR, 2023

MISAR: A Multimodal Instructional System with Augmented Reality.
CoRR, 2023

2021
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences.
CoRR, 2020

Learning from Interventions Using Hierarchical Policies for Safe Learning.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2018
Navigation by Imitation in a Pedestrian-Rich Environment.
CoRR, 2018


  Loading...