Yifan Yang

Orcid: 0000-0002-5481-2851

Affiliations:

Microsoft Research Asia, Shanghai, China
Peking University, Beijing, China (until 2021)

According to our database¹, Yifan Yang authored at least 62 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

SkillOpt: Executive Strategy for Self-Evolving Agent Skills.

[BibT_eX]

[DOI]

CoRR, May, 2026

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills.

[BibT_eX]

[DOI]

CoRR, May, 2026

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark.

[BibT_eX]

[DOI]

CoRR, May, 2026

EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents.

[BibT_eX]

[DOI]

CoRR, May, 2026

MemCompiler: Compile, Don't Inject - State-Conditioned Memory for Embodied Agents.

[BibT_eX]

[DOI]

CoRR, May, 2026

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation.

[BibT_eX]

[DOI]

CoRR, April, 2026

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation.

[BibT_eX]

[DOI]

CoRR, April, 2026

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation.

[BibT_eX]

[DOI]

CoRR, March, 2026

Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding.

[BibT_eX]

[DOI]

CoRR, March, 2026

EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, March, 2026

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

[BibT_eX]

[DOI]

CoRR, March, 2026

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents.

[BibT_eX]

[DOI]

CoRR, February, 2026

Joint Latency-Energy Optimization for Two-Tier Multiuser Multitask Offloading in AI-Agent Communication Networks.

[BibT_eX]

[DOI]

IEEE Internet Things J., 2026

AMID: Model-Agnostic Dataset Distillation by Adversarial Mutual Information Minimization.

[BibT_eX]

[DOI]

Proceedings of the ACM Web Conference 2026, 2026

MageBench: Bridging Large Multimodal Models to Agents.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

AVA: Towards Agentic Video Analytics with Vision Language Models.

[BibT_eX]

[DOI]

Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Learning Systems Expansion with Efficient Heterogeneity-aware Knowledge Transfer.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning.

[BibT_eX]

[DOI]

CoRR, October, 2025

VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL.

[BibT_eX]

[DOI]

CoRR, October, 2025

Diffusion^2: Turning 3D Environments into Radio Frequency Heatmaps.

[BibT_eX]

[DOI]

CoRR, October, 2025

Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices.

[BibT_eX]

[DOI]

IEEE Trans. Mob. Comput., September, 2025

AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation.

[BibT_eX]

[DOI]

CoRR, September, 2025

A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer.

[BibT_eX]

[DOI]

CoRR, August, 2025

Phi-Ground Tech Report: Advancing Perception in GUI Grounding.

[BibT_eX]

[DOI]

CoRR, July, 2025

ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL.

[BibT_eX]

[DOI]

CoRR, May, 2025

ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Empowering Agentic Video Analytics Systems with Video Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition.

[BibT_eX]

[DOI]

CoRR, March, 2025

Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences.

[BibT_eX]

[DOI]

Athanasios Karapantelakis

Christo Kurisummoottil Thomas

Emilio Calvanese Strinati

Ilias Chatzistefanidis

Maria Amparo Canaveras Galdon

Mehdi Ahmed Boudjelli

Rasoul Nikbakht Silab

Salah Eddine El Ayoubi

CoRR, March, 2025

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs.

[BibT_eX]

[DOI]

Abdelrahman Abouelenin

CoRR, March, 2025

Region-Adaptive Sampling for Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, February, 2025

Real-Time Neural-Enhancement for Online Cloud Gaming.

[BibT_eX]

[DOI]

CoRR, January, 2025

Zoomer: Adaptive Image Focus Optimization for Black-box MLLM.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems, 2025

VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution.

[BibT_eX]

[DOI]

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

REDUCIO! Generating 1K Video Within 16 Seconds Using Extremely Compressed Motion Latents.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

NERVE: Real-Time Neural Video Recovery and Enhancement on Mobile Devices.

[BibT_eX]

[DOI]

Proc. ACM Netw., 2024

VIGOR: Reviving Cloud Gaming Sessions.

[BibT_eX]

[DOI]

PACMNET, 2024

REDUCIO! Generating 1024⨉1024 Video within 16 Seconds using Extremely Compressed Motion Latents.

[BibT_eX]

[DOI]

CoRR, 2024

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation.

[BibT_eX]

[DOI]

CoRR, 2024

Making Every Frame Matter: Continuous Video Understanding for Large Models via Adaptive State Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

Advancing Multi-Modal Sensing Through Expandable Modality Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Understanding Training-free Diffusion Guidance: Mechanisms and Limitations.

[BibT_eX]

[DOI]

CoRR, 2024

Understanding and Improving Training-free Loss-based Diffusion Guidance.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

LoRASC: Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Online Video Quality Enhancement with Spatial-Temporal Look-Up Tables.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Unified Medical Image Pre-training in Language-Guided Common Semantic Space.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

Online Video Super-Resolution With Convolutional Kernel Bypass Grafts.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Real-Time Neural Video Recovery and Enhancement on Mobile Devices.

[BibT_eX]

[DOI]

CoRR, 2023

Neural Video Recovery for Cloud Gaming.

[BibT_eX]

[DOI]

CoRR, 2023

ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Attentive Mask CLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards Inference Efficient Deep Ensemble Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Similarity Distribution Based Membership Inference Attack on Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Online Video Super-Resolution with Convolutional Kernel Bypass Graft.

[BibT_eX]

[DOI]

CoRR, 2022

Yifan Yang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...