Yi Jiang

Orcid: 0000-0002-2133-8719

Affiliations:
  • Bytedance Inc., Beijing, China


According to our database1, Yi Jiang authored at least 41 papers between 2020 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Liquid: Language Models are Scalable and Unified Multi-Modal Generators.
Int. J. Comput. Vis., January, 2026

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation.
CoRR, November, 2025

Waver: Wave Your Way to Lifelike Video Generation.
CoRR, August, 2025

UniTok: A Unified Tokenizer for Visual Generation and Understanding.
CoRR, February, 2025

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation.
CoRR, February, 2025

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Goku: Flow Based Video Generative Foundation Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Liquid: Language Models are Scalable Multi-modal Generators.
CoRR, 2024

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation.
CoRR, 2024

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

General Object Foundation Model for Images and Videos at Scale.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Generative Region-Language Pretraining for Open-Ended Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Sparse R-CNN: An End-to-End Framework for Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces.
CoRR, 2023

Multi-Level Contrastive Learning for Dense Prediction Task.
CoRR, 2023

CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Simple Baseline for Open-World Tracking via Self-training.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Learning Object-Language Alignments for Open-Vocabulary Object Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Transformers for Open-world Instance Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Segment Every Reference Object in Spatial and Temporal Spaces.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EGC: Image Generation and Classification via a Diffusion Energy-Based Model.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

InstMove: Instance Motion for Object-centric Video Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Universal Instance Perception as Object Discovery and Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
The Runner-up Solution for YouTube-VIS Long Video Challenge 2022.
CoRR, 2022

Rethinking Resolution in the Context of Efficient Video Recognition.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Objects in Semantic Topology.
Proceedings of the Tenth International Conference on Learning Representations, 2022

ByteTrack: Multi-object Tracking by Associating Every Detection Box.
Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Grand Unification of Object Tracking.
Proceedings of the Computer Vision - ECCV 2022, 2022

In Defense of Online Models for Video Instance Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

SeqFormer: Sequential Transformer for Video Instance Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Language as Queries for Referring Video Object Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation.
CoRR, 2021

ByteTrack: Multi-Object Tracking by Associating Every Detection Box.
CoRR, 2021

What Makes for End-to-End Object Detection?
Proceedings of the 38th International Conference on Machine Learning, 2021

Sparse R-CNN: End-to-End Object Detection With Learnable Proposals.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
TransTrack: Multiple-Object Tracking with Transformer.
CoRR, 2020

OneNet: Towards End-to-End One-Stage Object Detection.
CoRR, 2020


  Loading...