Zhuoyang Zhang

Orcid: 0000-0002-3312-6246

According to our database1, Zhuoyang Zhang authored at least 16 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer.
CoRR, July, 2025

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation.
CoRR, July, 2025

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

NVILA: Efficient Frontier Visual Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Gaze-assisted visual grounding via knowledge distillation for referred object grasping with under-specified object referring.
Eng. Appl. Artif. Intell., 2024

NVILA: Efficient Frontier Visual Language Models.
CoRR, 2024

Condition-Aware Neural Network for Controlled Image Generation.
CoRR, 2024

Sparse Refinement for Efficient High-Resolution Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2024, 2024

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
GVGNet: Gaze-Directed Visual Grounding for Learning Under-Specified Object Referring Intention.
IEEE Robotics Autom. Lett., September, 2023

Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
CoRR, 2023

NSM4D: Neural Scene Model Based Online 4D Point Cloud Sequence Understanding.
CoRR, 2023

Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023


  Loading...