Kaicheng Yang

Orcid: 0009-0008-6073-9014

Affiliations:

DeepGlint, Beijing, China

According to our database¹, Kaicheng Yang authored at least 30 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Efficient, Validation-Free Intrinsic Quality Estimation for Large-Scale Face Recognition Datasets.

[BibT_eX]

[DOI]

CoRR, May, 2026

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence.

[BibT_eX]

[DOI]

CoRR, May, 2026

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards.

[BibT_eX]

[DOI]

CoRR, April, 2026

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence.

[BibT_eX]

[DOI]

CoRR, February, 2026

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset.

[BibT_eX]

[DOI]

CoRR, January, 2026

ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder.

[BibT_eX]

[DOI]

CoRR, October, 2025

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training.

[BibT_eX]

[DOI]

CoRR, September, 2025

PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training.

[BibT_eX]

[DOI]

CoRR, August, 2025

RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm.

[BibT_eX]

[DOI]

CoRR, February, 2025

The Solution to the WWW25 Text-based Person Anomaly Search Challenge.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025

ORID: Organ-Regional Information Driven Framework for Radiology Report Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Decoupled Global-Local Alignment for Improving Compositional Understanding.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

<i>RealSyn</i>: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Dual-Level Open-Vocabulary 3D Scene Representation for Instance-Aware Robot Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

Region-based Cluster Discrimination for Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization.

[BibT_eX]

[DOI]

Rolandos Alexandros Potamias

Linchao Zhu

Jiankang Deng

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

ForCenNet: Foreground-Centric Network for Document Image Rectification.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension.

[BibT_eX]

[DOI]

CoRR, 2024

High-Fidelity Facial Albedo Estimation via Texture Quantization.

[BibT_eX]

[DOI]

CoRR, 2024

1st Place Solution to the 1st SkatingVerse Challenge.

[BibT_eX]

[DOI]

CoRR, 2024

RWKV-CLIP: A Robust Vision-Language Representation Learner.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Multi-label Cluster Discrimination for Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

LaPA: Latent Prompt Assist Model for Medical Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Unicom: Universal and Compact Representation Learning for Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

ALIP: Adaptive Language-Image Pre-training with Synthetic Caption.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Kaicheng Yang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...