Sicong Leng

Orcid: 0000-0002-3084-5026

According to our database¹, Sicong Leng authored at least 20 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources.

[BibT_eX]

[DOI]

CoRR, September, 2025

From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning.

[BibT_eX]

[DOI]

CoRR, September, 2025

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation.

[BibT_eX]

[DOI]

CoRR, September, 2025

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning.

[BibT_eX]

[DOI]

CoRR, July, 2025

Two Is Better Than One: Rotations Scale LoRAs.

[BibT_eX]

[DOI]

CoRR, May, 2025

Advancing Expert Specialization for Better MoE.

[BibT_eX]

[DOI]

CoRR, May, 2025

Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions.

[BibT_eX]

[DOI]

CoRR, February, 2025

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding.

[BibT_eX]

[DOI]

CoRR, January, 2025

Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss.

[BibT_eX]

[DOI]

CoRR, 2024

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio.

[BibT_eX]

[DOI]

CoRR, 2024

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention.

[BibT_eX]

[DOI]

CoRR, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Constrained Layout Generation with Factor Graphs.

[BibT_eX]

[DOI]

Mohammed Haroon Dupty

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Uncovering what, why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Tell2Design: A Dataset for Language-Guided Floor Plan Generation.

[BibT_eX]

[DOI]

Sicong Leng

Yang Zhou

Mohammed Haroon Dupty

Wee Sun Lee

Sam Joyce

Wei Lu

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2021

Interventional Video Grounding With Dual Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Sicong Leng

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...