Shijian Deng

Orcid: 0009-0008-9560-702X

According to our database1, Shijian Deng authored at least 21 papers between 2008 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Do Joint Audio-Video Generation Models Understand Physics?
CoRR, May, 2026

Audio-Visual Intelligence in Large Foundation Models.
CoRR, May, 2026

OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text.
CoRR, April, 2026

Omni-MMSI: Toward Identity-attributed Social Interaction Understanding.
CoRR, April, 2026

ARGaze: Autoregressive Transformers for Online Egocentric Gaze Estimation.
CoRR, February, 2026

Modality-Inconsistent Continual Learning of Multimodal Large Language Models.
Trans. Mach. Learn. Res., 2026

Towards Online Multimodal Social Interaction Understanding.
Trans. Mach. Learn. Res., 2026

Toward Gaze Target Detection of Young Autistic Children.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Explainable AI-Generated Image Detection RewardBench.
CoRR, November, 2025

Towards Online Multi-Modal Social Interaction Understanding.
CoRR, March, 2025

Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition.
IEEE Trans. Multim., 2025

AV-DiT: Taming Image Diffusion Transformers for Efficient Joint Audio and Video Generation.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Self-Improvement in Multimodal Large Language Models: A Survey.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

2024
Cross Modality Bias in Visual Question Answering: A Causal View With Possible Worlds VQA.
IEEE Trans. Multim., 2024

Modality-Inconsistent Continual Learning of Multimodal Large Language Models.
CoRR, 2024

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach.
CoRR, 2024

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation.
CoRR, 2024

Continual Audio-Visual Sound Separation.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation.
CoRR, 2023

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA.
CoRR, 2023

2008
Technology and Realization of Developing Monitoring Software Based on Multi-Application's Cooperation.
Proceedings of the International Conference on Computer Science and Software Engineering, 2008


  Loading...