Shengbang Tong

According to our database1, Shengbang Tong authored at least 38 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images.
CoRR, April, 2026

Beyond Language Modeling: An Exploration of Multimodal Pretraining.
CoRR, March, 2026

Asymmetric Idiosyncrasies in Multimodal Models.
CoRR, February, 2026

Reliable and Responsible Foundation Models: A Comprehensive Survey.
CoRR, February, 2026

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders.
CoRR, January, 2026

Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images.
CoRR, November, 2025

Cambrian-S: Towards Spatial Supersensing in Video.
CoRR, November, 2025

Diffusion Transformers with Representation Autoencoders.
CoRR, October, 2025

Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training.
CoRR, September, 2025

From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models.
CoRR, June, 2025

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction.
CoRR, June, 2025

Reliable and Responsible Foundation Models.
Trans. Mach. Learn. Res., 2025

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Scaling Language-Free Visual Representation Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?
J. Mach. Learn. Res., 2024

Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning.
CoRR, 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.
CoRR, 2024

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning.
CoRR, 2024

Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription.
CoRR, 2024

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Investigating the Catastrophic Forgetting in Multimodal Large Language Model Fine-Tuning.
Proceedings of the Conference on Parsimony and Learning, 2024

Emergence of Segmentation with Minimalistic White-Box Transformers.
Proceedings of the Conference on Parsimony and Learning, 2024

Unsupervised Learning of Structured Representation via Closed-Loop Transcription.
Proceedings of the Conference on Parsimony and Learning, 2024

Closed-Loop Transcription via Convolutional Sparse Coding.
Proceedings of the Conference on Parsimony and Learning, 2024

2023
Investigating the Catastrophic Forgetting in Multimodal Large Language Models.
CoRR, 2023

EMP-SSL: Towards Self-Supervised Learning in One Training Epoch.
CoRR, 2023

White-Box Transformers via Sparse Rate Reduction.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Mass-Producing Failures of Multimodal Systems with Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Incremental Learning of Structured Memory via Closed-Loop Transcription.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Unsupervised Manifold Linearizing and Clustering.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
CTRL: Closed-Loop Transcription to an LDR via Minimaxing Rate Reduction.
Entropy, 2022

Unsupervised Learning of Structured Representations via Closed-Loop Transcription.
CoRR, 2022

Revisiting Sparse Convolutional Model for Visual Recognition.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction.
CoRR, 2021


  Loading...