We stand with Ukraine

We stand with Ukraine

Kun Yan

Orcid: 0000-0001-8290-5169

Affiliations:

StepFun, Shanghai, China

According to our database¹, Kun Yan authored at least 20 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Qwen-Image-VAE-2.0 Technical Report.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Dynamic Sparsity in Large-Scale Video DiT Training.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition.

[DOI]

,

,

,

,

,

,

,

,

,

Heung-Yeung Shum

,

,

,

,

CoRR, December, 2025

Resolving Ambiguity in Gaze-Facilitated Visual Assistant Interaction Paradigm.

[DOI]

,

,

,

,

,

,

,

CoRR, September, 2025

Qwen-Image Technical Report.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, August, 2025

DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training.

[DOI]

,

,

,

,

,

,

,

,

CoRR, February, 2025

Taming Teacher Forcing for Masked Autoregressive Video Generation.

[DOI]

,

,

,

,

,

,

,

,

,

,

Heung-Yeung Shum

CoRR, January, 2025

Taming Teacher Forcing for Masked Autoregressive Video Generation.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios.

[DOI]

,

,

,

,

,

,

,

,

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., May, 2024

Voila-A: Aligning Vision-Language Models with User's Gaze Attention.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

HORIZON: High-Resolution Semantically Controlled Panorama Synthesis.

[DOI]

,

,

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

GroundNLQ @ Ego4D Natural Language Queries Challenge 2023.

[DOI]

,

,

,

,

,

,

Wing-Kwong Chan

,

,

,

Mike Zheng Shou

CoRR, 2023

KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, 2023

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding.

[DOI]

,

,

,

,

,

Wing Kwong Chan

,

,

Mike Zheng Shou

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022.

[DOI]

,

,

,

,

,

Wing Kwong Chan

,

,

,

CoRR, 2022

HORIZON: A High-Resolution Panorama Synthesis Framework.

[DOI]

,

,

,

,

,

,

CoRR, 2022

Learning Temporal Video Procedure Segmentation from an Automatically Collected Large Dataset.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Trace Controlled Text to Image Generation.

[DOI]

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

Control Image Captioning Spatially and Temporally.

[DOI]

,

,

,

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Loading...