Saining Xie

According to our database¹, Saining Xie authored at least 83 papers between 2012 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting.

[BibT_eX]

[DOI]

CoRR, November, 2025

Cambrian-S: Towards Spatial Supersensing in Video.

[BibT_eX]

[DOI]

CoRR, November, 2025

SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding.

[BibT_eX]

[DOI]

CoRR, November, 2025

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts.

[BibT_eX]

[DOI]

CoRR, November, 2025

AutoCode: LLMs as Problem Setters for Competitive Programming.

[BibT_eX]

[DOI]

CoRR, October, 2025

Diffusion Transformers with Representation Autoencoders.

[BibT_eX]

[DOI]

CoRR, October, 2025

GaussianLens: Localized High-Resolution Reconstruction via On-Demand Gaussian Densification.

[BibT_eX]

[DOI]

CoRR, September, 2025

Meta CLIP 2: A Worldwide Scaling Recipe.

[BibT_eX]

[DOI]

CoRR, July, 2025

Spatial Mental Modeling from Limited Views.

[BibT_eX]

[DOI]

Keshigeyan Chandrasegaran

CoRR, June, 2025

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing.

[BibT_eX]

[DOI]

CoRR, June, 2025

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

[BibT_eX]

[DOI]

CoRR, June, 2025

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis.

[BibT_eX]

[DOI]

CoRR, May, 2025

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset.

[BibT_eX]

[DOI]

CoRR, May, 2025

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, April, 2025

Transfer between Modalities with MetaQueries.

[BibT_eX]

[DOI]

CoRR, April, 2025

Scaling Language-Free Visual Representation Learning.

[BibT_eX]

[DOI]

CoRR, April, 2025

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps.

[BibT_eX]

[DOI]

CoRR, January, 2025

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

On Scaling Up 3D Gaussian Splatting Training.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Deconstructing Denoising Diffusion Models for Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark.

[BibT_eX]

[DOI]

Christopher D. Manning

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Scaling Inference Time Compute for Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Science-T2I: Addressing Scientific Illusions in Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

What Does a Visual Formal Analysis of the World's 500 Most Famous Paintings Tell Us About Multimodal LLMs?

[BibT_eX]

[DOI]

Muzi Tao

Saining Xie

Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Demystifying CLIP Data.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Altogether: Image Captioning via Re-aligning Alt-text.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

V-IRL: Grounding Virtual Intelligence in Real Life.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SiT: Exploring Flow and Diffusion-Based Generative Models with Scalable Interpolant Transformers.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Fast Encoding and Decoding for Implicit Video Representation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Image Sculpting: Precise Object Editing with 3D Geometry Control.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs.

[BibT_eX]

[DOI]

Penghao Wu

Saining Xie

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MoDE: CLIP Data Experts via Clustering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Going Denser with Open-Vocabulary Part Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scalable Diffusion Models with Transformers.

[BibT_eX]

[DOI]

William Peebles

Saining Xie

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

CiT: Curation in Training for Effective Vision-Language Data.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Exploring Long-Sequence Masked Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2022

SLIP: Self-supervision Meets Language-Image Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Masked Autoencoders Are Scalable Vision Learners.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Masked Feature Prediction for Self-Supervised Visual Pre-Training.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

A ConvNet for the 2020s.

[BibT_eX]

[DOI]

Zhuang Liu

Hanzi Mao

Chao-Yuan Wu

Christoph Feichtenhofer

Trevor Darrell

Saining Xie

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision.

[BibT_eX]

[DOI]

CoRR, 2021

Benchmarking Detection Transfer Learning with Vision Transformers.

[BibT_eX]

[DOI]

CoRR, 2021

On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness.

[BibT_eX]

[DOI]

Eric Mintun

Alexander Kirillov

Saining Xie

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Pri3D: Can 3D Priors Help 2D Representation Learning?

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

An Empirical Study of Training Self-Supervised Vision Transformers.

[BibT_eX]

[DOI]

Xinlei Chen

Saining Xie

Kaiming He

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Exploring Data-Efficient 3D Scene Understanding With Contrastive Scene Contexts.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Graph Structure of Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Decoupling Representation and Classifier for Long-Tailed Recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Are Labels Necessary for Neural Architecture Search?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Momentum Contrast for Unsupervised Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Sample-Efficient Neural Architecture Search by Learning Action Space.

[BibT_eX]

[DOI]

CoRR, 2019

Exploring Randomly Wired Neural Networks for Image Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

On Network Design Spaces for Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Order-Aware Generative Modeling Using the 3D-Craft Dataset.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018

Deep Representation Learning with Induced Structural Priors.

[BibT_eX]

[DOI]

Saining Xie

PhD thesis, 2018

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Attentional ShapeContextNet for Point Cloud Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Rethinking Spatiotemporal Feature Learning For Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2017

Aggregated Residual Transformations for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Top-Down Learning for Structured Labeling with Convolutional Pseudoprior.

[BibT_eX]

[DOI]

Saining Xie

Xun Huang

Zhuowen Tu

Proceedings of the Computer Vision - ECCV 2016, 2016

2015

Convolutional Pseudo-Prior for Structured Labeling.

[BibT_eX]

[DOI]

Saining Xie

Xun Huang

Zhuowen Tu

CoRR, 2015

Holistically-Nested Edge Detection.

[BibT_eX]

[DOI]

Saining Xie

Zhuowen Tu

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Hyper-class augmented and regularized deep learning for fine-grained image classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Deeply-Supervised Nets.

[BibT_eX]

[DOI]

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2014

Pairwise constrained concept factorization for data representation.

[BibT_eX]

[DOI]

Neural Networks, 2014

Semi-supervised non-negative matrix factorization for image clustering with graph Laplacian.

[BibT_eX]

[DOI]

Yangcheng He

Hongtao Lu

Saining Xie

Multim. Tools Appl., 2014

2013

Perception Preserving Projections.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference, 2013

2012

Multi-task co-clustering via nonnegative matrix factorization.

[BibT_eX]

[DOI]

Saining Xie

Hongtao Lu

Yangcheng He

Proceedings of the 21st International Conference on Pattern Recognition, 2012

Saining Xie

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...