Jinguo Zhu
Orcid: 0000-0002-3616-4264
According to our database1,
Jinguo Zhu
authored at least 26 papers
between 2019 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models.
CoRR, April, 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
CoRR, April, 2025
CoRR, March, 2025
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
2024
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.
Vis. Intell., 2024
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding.
CoRR, 2024
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.
CoRR, 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.
CoRR, 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.
CoRR, 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Power-Llava: Large Language and Vision Assistant for Power Transmission Line Inspection.
Proceedings of the IEEE International Conference on Image Processing, 2024
Intent Negotiation Empowers Advanced Operations for the Intent-Driven Autonomous Network.
Proceedings of the 27th Conference on Innovation in Clouds, Internet and Networks, 2024
2023
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation.
CoRR, 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.
CoRR, 2021
Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification.
CoRR, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
2020
A Deep Learning Method to Detect Foreign Objects for Inspecting Power Transmission Lines.
IEEE Access, 2020
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020
2019
Proceedings of the 16th International Conference on Machine Vision Applications, 2019