Shengbang Tong

According to our database¹, Shengbang Tong authored at least 38 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images.

[BibT_eX]

[DOI]

CoRR, April, 2026

Beyond Language Modeling: An Exploration of Multimodal Pretraining.

[BibT_eX]

[DOI]

CoRR, March, 2026

Asymmetric Idiosyncrasies in Multimodal Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

Reliable and Responsible Foundation Models: A Comprehensive Survey.

[BibT_eX]

[DOI]

CoRR, February, 2026

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders.

[BibT_eX]

[DOI]

CoRR, January, 2026

Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images.

[BibT_eX]

[DOI]

CoRR, November, 2025

Cambrian-S: Towards Spatial Supersensing in Video.

[BibT_eX]

[DOI]

CoRR, November, 2025

Diffusion Transformers with Representation Autoencoders.

[BibT_eX]

[DOI]

CoRR, October, 2025

Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training.

[BibT_eX]

[DOI]

CoRR, September, 2025

From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction.

[BibT_eX]

[DOI]

CoRR, June, 2025

Reliable and Responsible Foundation Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Scaling Language-Free Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2024

Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning.

[BibT_eX]

[DOI]

Shentong Mo

Shengbang Tong

CoRR, 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription.

[BibT_eX]

[DOI]

CoRR, 2024

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models.

[BibT_eX]

[DOI]

Benjamin David Haeffele

René Vidal

Yi Ma

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Investigating the Catastrophic Forgetting in Multimodal Large Language Model Fine-Tuning.

[BibT_eX]

[DOI]

Proceedings of the Conference on Parsimony and Learning, 2024

Emergence of Segmentation with Minimalistic White-Box Transformers.

[BibT_eX]

[DOI]

Proceedings of the Conference on Parsimony and Learning, 2024

Unsupervised Learning of Structured Representation via Closed-Loop Transcription.

[BibT_eX]

[DOI]

Proceedings of the Conference on Parsimony and Learning, 2024

Closed-Loop Transcription via Convolutional Sparse Coding.

[BibT_eX]

[DOI]

Proceedings of the Conference on Parsimony and Learning, 2024

2023

Investigating the Catastrophic Forgetting in Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

EMP-SSL: Towards Self-Supervised Learning in One Training Epoch.

[BibT_eX]

[DOI]

CoRR, 2023

White-Box Transformers via Sparse Rate Reduction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Mass-Producing Failures of Multimodal Systems with Language Models.

[BibT_eX]

[DOI]

Shengbang Tong

Erik Jones

Jacob Steinhardt

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Incremental Learning of Structured Memory via Closed-Loop Transcription.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Unsupervised Manifold Linearizing and Clustering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

CTRL: Closed-Loop Transcription to an LDR via Minimaxing Rate Reduction.

[BibT_eX]

[DOI]

Entropy, 2022

Unsupervised Learning of Structured Representations via Closed-Loop Transcription.

[BibT_eX]

[DOI]

CoRR, 2022

Revisiting Sparse Convolutional Model for Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction.

[BibT_eX]

[DOI]

CoRR, 2021

Shengbang Tong

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...