Saining Xie

According to our database1, Saining Xie authored at least 77 papers between 2012 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Meta CLIP 2: A Worldwide Scaling Recipe.
CoRR, July, 2025

Spatial Mental Modeling from Limited Views.
CoRR, June, 2025

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing.
CoRR, June, 2025

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
CoRR, June, 2025

Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs.
CoRR, May, 2025

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis.
CoRR, May, 2025

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset.
CoRR, May, 2025

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers.
CoRR, April, 2025

Transfer between Modalities with MetaQueries.
CoRR, April, 2025

Scaling Language-Free Visual Representation Learning.
CoRR, April, 2025

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop.
CoRR, March, 2025

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training.
CoRR, January, 2025

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps.
CoRR, January, 2025

On Scaling Up 3D Gaussian Splatting Training.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Deconstructing Denoising Diffusion Models for Self-Supervised Learning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Scaling Inference Time Compute for Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Science-T2I: Addressing Scientific Illusions in Image Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning.
CoRR, 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.
CoRR, 2024

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning.
CoRR, 2024

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

What Does a Visual Formal Analysis of the World's 500 Most Famous Paintings Tell Us About Multimodal LLMs?
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Demystifying CLIP Data.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Altogether: Image Captioning via Re-aligning Alt-text.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

V-IRL: Grounding Virtual Intelligence in Real Life.
Proceedings of the Computer Vision - ECCV 2024, 2024

SiT: Exploring Flow and Diffusion-Based Generative Models with Scalable Interpolant Transformers.
Proceedings of the Computer Vision - ECCV 2024, 2024

Fast Encoding and Decoding for Implicit Video Representation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Image Sculpting: Precise Object Editing with 3D Geometry Control.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MoDE: CLIP Data Experts via Clustering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Going Denser with Open-Vocabulary Part Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scalable Diffusion Models with Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

CiT: Curation in Training for Effective Vision-Language Data.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Exploring Long-Sequence Masked Autoencoders.
CoRR, 2022

SLIP: Self-supervision Meets Language-Image Pre-training.
Proceedings of the Computer Vision - ECCV 2022, 2022

Masked Autoencoders Are Scalable Vision Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Masked Feature Prediction for Self-Supervised Visual Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

A ConvNet for the 2020s.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision.
CoRR, 2021

Benchmarking Detection Transfer Learning with Vision Transformers.
CoRR, 2021

On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Pri3D: Can 3D Priors Help 2D Representation Learning?
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

An Empirical Study of Training Self-Supervised Vision Transformers.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Exploring Data-Efficient 3D Scene Understanding With Contrastive Scene Contexts.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Graph Structure of Neural Networks.
Proceedings of the 37th International Conference on Machine Learning, 2020

Decoupling Representation and Classifier for Long-Tailed Recognition.
Proceedings of the 8th International Conference on Learning Representations, 2020

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding.
Proceedings of the Computer Vision - ECCV 2020, 2020

Are Labels Necessary for Neural Architecture Search?
Proceedings of the Computer Vision - ECCV 2020, 2020

FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Momentum Contrast for Unsupervised Visual Representation Learning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Sample-Efficient Neural Architecture Search by Learning Action Space.
CoRR, 2019

Exploring Randomly Wired Neural Networks for Image Recognition.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

On Network Design Spaces for Visual Recognition.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Order-Aware Generative Modeling Using the 3D-Craft Dataset.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018
Deep Representation Learning with Induced Structural Priors.
PhD thesis, 2018

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification.
Proceedings of the Computer Vision - ECCV 2018, 2018

Attentional ShapeContextNet for Point Cloud Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Holistically-Nested Edge Detection.
Int. J. Comput. Vis., 2017

Rethinking Spatiotemporal Feature Learning For Video Understanding.
CoRR, 2017

Aggregated Residual Transformations for Deep Neural Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Top-Down Learning for Structured Labeling with Convolutional Pseudoprior.
Proceedings of the Computer Vision - ECCV 2016, 2016

2015
Convolutional Pseudo-Prior for Structured Labeling.
CoRR, 2015

Hyper-class augmented and regularized deep learning for fine-grained image classification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Deeply-Supervised Nets.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2014
Pairwise constrained concept factorization for data representation.
Neural Networks, 2014

Semi-supervised non-negative matrix factorization for image clustering with graph Laplacian.
Multim. Tools Appl., 2014

2013
Perception Preserving Projections.
Proceedings of the British Machine Vision Conference, 2013

2012
Multi-task co-clustering via nonnegative matrix factorization.
Proceedings of the 21st International Conference on Pattern Recognition, 2012


  Loading...