We stand with Ukraine

We stand with Ukraine

Jinxing Zhou

Orcid: 0000-0001-6402-7593

According to our database¹, Jinxing Zhou authored at least 43 papers between 2008 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

MTAVG-Bench 2.0: Diagnosing Failure Modes of Cinematic Expressiveness in Multi-Talker Audio-Video Generation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Mettle: Meta-Token Learning for Memory-Efficient Audio-Visual Adaptation.

[DOI]

,

,

,

,

,

,

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., April, 2026

FreqPhys: Repurposing Implicit Physiological Frequency Prior for Robust Remote Photoplethysmography.

[DOI]

,

,

,

,

,

CoRR, April, 2026

Face-Guided Sentiment Boundary Enhancement for Weakly-Supervised Temporal Sentiment Localization.

[DOI]

,

,

,

,

,

,

,

CoRR, March, 2026

Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation.

[DOI]

,

,

,

,

,

,

Rao Muhammad Anwer

,

Hisham Cholakkal

CoRR, February, 2026

MTAVG-Bench: A Comprehensive Benchmark for Evaluating Multi-Talker Dialogue-Centric Audio-Video Generation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, February, 2026

MTAVG-Bench: A Diagnostic Benchmark for Multi-Talker Dialogue-Centric Audio-Video Generation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization.

[DOI]

,

,

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation.

[DOI]

,

,

,

,

,

Hisham Cholakkal

,

Rao Muhammad Anwer

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Learning Spatial Decay for Vision Transformers.

[DOI]

,

,

,

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

A Closer Look at Knowledge Distillation in Spiking Neural Network Training.

[DOI]

,

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos.

[DOI]

Mohammed Irfan Kurpath

,

Jaseel Muhammad Kaithakkodan

,

,

Sahal Shaji Mullappilly

,

Mohammad Almansoori

,

,

Beknur Kalmakhanbet

,

,

,

,

,

Fahad Shahbaz Khan

,

,

Rao Muhammad Anwer

,

Hisham Cholakkal

CoRR, December, 2025

User-Feedback-Driven Continual Adaptation for Vision-and-Language Navigation.

[DOI]

,

,

,

,

,

,

,

,

CoRR, December, 2025

Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection.

[DOI]

,

,

,

,

,

,

,

CoRR, September, 2025

SimToken: A Simple Baseline for Referring Audio-Visual Segmentation.

[DOI]

,

,

,

,

,

CoRR, September, 2025

Composed Object Retrieval: Object-level Retrieval via Composed Expressions.

[DOI]

,

,

,

,

,

,

Fahad Shahbaz Khan

CoRR, August, 2025

Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective.

[DOI]

,

,

,

,

,

,

,

,

CoRR, July, 2025

Towards Energy-efficient Audio-visual Classification via Multimodal Interactive Spiking Neural Network.

[DOI]

,

,

,

,

ACM Trans. Multim. Comput. Commun. Appl., May, 2025

Temporal Boundary Awareness Network for Repetitive Action Counting.

[DOI]

Zhenqiang Zhang

,

,

,

,

,

,

ACM Trans. Multim. Comput. Commun. Appl., April, 2025

Audio-Visual Segmentation with Semantics.

[DOI]

,

,

,

,

,

,

Stan Birchfield

,

,

,

,

Int. J. Comput. Vis., April, 2025

MAviS: A Multimodal Conversational Assistant For Avian Species.

[DOI]

Yevheniia Kryklyvets

,

Mohammed Irfan Kurpath

,

Sahal Shaji Mullappilly

,

,

Fahad Shahbaz Khan

,

Rao Muhammad Anwer

,

,

Hisham Cholakkal

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Towards Open-Vocabulary Audio-Visual Event Localization.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Audio-Visual Instance Segmentation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing.

[DOI]

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

PhysDiff: Physiology-based Dynamicity Disentangled Diffusion Model for Remote Physiological Measurement.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Patch-level Sounding Object Tracking for Audio-Visual Question Answering.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-Wise Pseudo Labeling.

[DOI]

,

,

,

Int. J. Comput. Vis., November, 2024

MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Repetitive Action Counting with Feature Interaction Enhancement and Adaptive Gate Fusion.

[DOI]

,

,

,

,

,

,

Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024

TAVGBench: Benchmarking Text to Audible-Video Generation.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Label-Anticipated Event Disentanglement for Audio-Visual Video Parsing.

[DOI]

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Object-Aware Adaptive-Positivity Learning for Audio-Visual Question Answering.

[DOI]

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Contrastive Positive Sample Propagation Along the Audio-Visual Event Line.

[DOI]

,

,

IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Memorial GAN With Joint Semantic Optimization for Unpaired Image Captioning.

[DOI]

,

,

,

,

IEEE Trans. Cybern., 2023

Improving Audio-Visual Video Parsing with Pseudo Visual Labels.

[DOI]

,

,

,

CoRR, 2023

Fine-grained Audible Video Description.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Audio-Visual Segmentation.

[DOI]

,

,

,

,

,

Stan Birchfield

,

,

,

,

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

Changes Detection and Object-Oriented Classification of Major Wetland Cover Types in Response to Driving Forces in Zoige County, Eastern Qinghai-Tibetan Plateau.

[DOI]

,

,

,

,

IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2021

Positive Sample Propagation Along the Audio-Visual Event Line.

[DOI]

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2010

Study on simulation of exotic plant invasion based on a CA model.

[DOI]

,

,

,

,

Proceedings of the Sixth International Conference on Natural Computation, 2010

2008

Neural Network-Based Early Warning System for Debris Flow Disaster in the Three Gorges Reservoir Region.

[DOI]

,

Proceedings of the Fourth International Conference on Natural Computation, 2008

Loading...