Yi Jiang

Orcid: 0000-0002-2133-8719

Affiliations:

Bytedance Inc., Beijing, China

According to our database¹, Yi Jiang authored at least 54 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Veda: Scalable Video Diffusion via Distilled Sparse Attention.

[BibT_eX]

[DOI]

CoRR, May, 2026

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens.

[BibT_eX]

[DOI]

CoRR, March, 2026

Liquid: Language Models are Scalable and Unified Multi-Modal Generators.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., January, 2026

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation.

[BibT_eX]

[DOI]

CoRR, January, 2026

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, January, 2026

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Waver: Wave Your Way to Lifelike Video Generation.

[BibT_eX]

[DOI]

CoRR, August, 2025

DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction.

[BibT_eX]

[DOI]

CoRR, May, 2025

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation.

[BibT_eX]

[DOI]

CoRR, February, 2025

UniTok: a Unified Tokenizer for Visual Generation and Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Goku: Flow Based Video Generative Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Liquid: Language Models are Scalable Multi-modal Generators.

[BibT_eX]

[DOI]

CoRR, 2024

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Recognize Any Regions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

General Object Foundation Model for Images and Videos at Scale.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Generative Region-Language Pretraining for Open-Ended Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MotionMAE: Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the 35th British Machine Vision Conference, 2024

2023

Sparse R-CNN: An End-to-End Framework for Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-Level Contrastive Learning for Dense Prediction Task.

[BibT_eX]

[DOI]

CoRR, 2023

CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Simple Baseline for Open-World Tracking via Self-training.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Learning Object-Language Alignments for Open-Vocabulary Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.

[BibT_eX]

[DOI]

Kannappan Palaniappan

Norbert Scherer-Negenborn

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Transformers for Open-world Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Segment Every Reference Object in Spatial and Temporal Spaces.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EGC: Image Generation and Classification via a Diffusion Energy-Based Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

InstMove: Instance Motion for Object-centric Video Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Universal Instance Perception as Object Discovery and Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

The Runner-up Solution for YouTube-VIS Long Video Challenge 2022.

[BibT_eX]

[DOI]

CoRR, 2022

Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2022

MetaFormer: A Unified Meta Framework for Fine-Grained Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Rethinking Resolution in the Context of Efficient Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Objects in Semantic Topology.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

ByteTrack: Multi-object Tracking by Associating Every Detection Box.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Grand Unification of Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

In Defense of Online Models for Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

SeqFormer: Sequential Transformer for Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Language as Queries for Referring Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2021

ByteTrack: Multi-Object Tracking by Associating Every Detection Box.

[BibT_eX]

[DOI]

CoRR, 2021

What Makes for End-to-End Object Detection?

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Sparse R-CNN: End-to-End Object Detection With Learnable Proposals.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

TransTrack: Multiple-Object Tracking with Transformer.

[BibT_eX]

[DOI]

CoRR, 2020

OneNet: Towards End-to-End One-Stage Object Detection.

[BibT_eX]

[DOI]

CoRR, 2020

Yi Jiang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...