Zhengyuan Yang
Orcid: 0000-0002-5808-0889
According to our database1,
Zhengyuan Yang
authored at least 109 papers
between 2015 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers.
CoRR, June, 2025
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs.
CoRR, June, 2025
What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding.
CoRR, June, 2025
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation.
CoRR, May, 2025
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning.
CoRR, May, 2025
CoRR, May, 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning.
CoRR, April, 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement.
CoRR, April, 2025
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models.
CoRR, April, 2025
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models.
CoRR, March, 2025
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning.
CoRR, March, 2025
CoRR, February, 2025
CoRR, January, 2025
CoRR, January, 2025
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
2024
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities.
Dataset, December, 2024
IEEE Trans. Circuits Syst. Video Technol., August, 2024
Found. Trends Comput. Graph. Vis., 2024
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation.
CoRR, 2024
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension.
CoRR, 2024
CoRR, 2024
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models.
CoRR, 2024
CoRR, 2024
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024
Idea2Img: Iterative Self-refinement with GPT-4V for Automatic Image Design and Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
SGFormer: Semantic Graph Transformer for Point Cloud-Based 3D Scene Graph Generation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models.
CoRR, 2023
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation.
CoRR, 2023
CoRR, 2023
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation.
CoRR, 2023
CoRR, 2023
CoRR, 2023
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
2022
Trans. Mach. Learn. Res., 2022
Proceedings of the 26th International Conference on Pattern Recognition, 2022
Proceedings of the 11th International Conference on Networks, Communication and Computing, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling.
CoRR, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Proceedings of the 25th International Conference on Pattern Recognition, 2020
Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation.
Proceedings of the 25th International Conference on Pattern Recognition, 2020
Proceedings of the Computer Vision - ECCV 2020, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
2019
Action Recognition With Spatio-Temporal Visual Attention on Skeleton Image Sequences.
IEEE Trans. Circuits Syst. Video Technol., 2019
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2018
End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perception.
CoRR, 2018
End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perceptions.
Proceedings of the 24th International Conference on Pattern Recognition, 2018
Proceedings of the 24th International Conference on Pattern Recognition, 2018
2017
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017
2015
Curve fitting and optimal interpolation for CNC machining under confined error using quadratic B-splines.
Comput. Aided Des., 2015