Yuanhuiyi Lyu

Orcid: 0009-0004-1450-811X

According to our database1, Yuanhuiyi Lyu authored at least 44 papers between 2023 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Perceptual Flow Network for Visually Grounded Reasoning.
CoRR, May, 2026

TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders.
CoRR, April, 2026

SAP: Segment Any 4K Panorama.
CoRR, March, 2026

EgoIntent: An Egocentric Step-level Benchmark for Understanding What, Why, and Next.
CoRR, March, 2026

StruVis: Enhancing Reasoning-based Text-to-Image Generation via Thinking with Structured Vision.
CoRR, March, 2026

Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval.
CoRR, February, 2026

BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis.
Trans. Mach. Learn. Res., 2026

T-Rex-Omni: Integrating Negative Visual Prompt in Generic Object Detection.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation.
CoRR, November, 2025

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks.
CoRR, October, 2025

AI for Service: Proactive Assistance with AI Glasses.
CoRR, October, 2025

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs.
CoRR, October, 2025

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods.
CoRR, October, 2025

Don't Just Chase "Highlighted Tokens" in MLLMs: Revisiting Visual Holistic Context Retention.
CoRR, October, 2025

Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation.
CoRR, September, 2025

PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era.
CoRR, September, 2025

MLLMs are Deeply Affected by Modality Bias.
CoRR, May, 2025

Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?
CoRR, May, 2025

DiMeR: Disentangled Mesh Reconstruction Model.
CoRR, April, 2025

Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook.
CoRR, March, 2025

MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation.
CoRR, March, 2025

SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Reducing Unimodal Bias in Multi-Modal Semantic Segmentation With Multi-Scale Functional Entropy Regularization.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

From Reusing to Forecasting: Accelerating Diffusion Models With Taylorseers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
MAGIC++: Efficient and Resilient Modality-Agnostic Semantic Segmentation via Hierarchical Modality Selection.
CoRR, 2024

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges.
CoRR, 2024

Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation.
CoRR, 2024

EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More.
CoRR, 2024

OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All.
CoRR, 2024

ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More.
CoRR, 2024

UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All.
CoRR, 2024

Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation.
CoRR, 2024

Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

EventBind: Learning a Unified Representation to Bind Them All for Event-Based Open-World Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Centering the Value of Every Modality: Towards Efficient and Resilient Modality-Agnostic Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Learning Modality-Agnostic Representation for Semantic Segmentation from Any Modalities.
Proceedings of the Computer Vision - ECCV 2024, 2024

ExACT: Language-Guided Conceptual Reasoning and Uncertainty Estimation for Event-Based Action Recognition and More.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
E-CLIP: Towards Label-efficient Event-based Open-world Understanding by CLIP.
CoRR, 2023


  Loading...