Sibo Song

Orcid: 0009-0000-0516-167X

According to our database1, Sibo Song authored at least 17 papers between 2014 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making.
CoRR, June, 2025

OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding.
CoRR, April, 2025

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models.
CoRR, February, 2025

Qwen2.5-VL Technical Report.
CoRR, February, 2025

Generative compositor for few-shot visual information extraction.
Pattern Recognit., 2025

2024
OMNIPARSER: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
ICDAR 2023 Competition on Born Digital Video Text Question Answering.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Vision-Language Pre-Training for Boosting Scene Text Detectors.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Saak Transform-Based Machine Learning for Light-Sheet Imaging of Cardiac Trabeculation.
IEEE Trans. Biomed. Eng., 2021

2018
Defense Against Adversarial Attacks with Saak Transform.
CoRR, 2018

Deep Adaptive Temporal Pooling for Activity Recognition.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017
Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text.
CoRR, 2017

On classification of distorted images with deep convolutional neural networks.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
Egocentric activity recognition with multimodal fisher vector.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016

2014
Activity Recognition in Egocentric Life-Logging Videos.
Proceedings of the Computer Vision - ACCV 2014 Workshops, 2014


  Loading...