We stand with Ukraine

We stand with Ukraine

Viet-Khoa Vo-Ho

Orcid: 0000-0003-0277-7094

Affiliations:

Vietnam National University, Ho Chi Minh City, Vietnam
University of Arkansas, Fayetteville, USA

According to our database¹, Viet-Khoa Vo-Ho authored at least 37 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org
on ieeexplore.ieee.org

On csauthors.net:

Bibliography

2026

DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving.

[DOI]

,

,

,

,

Duc Minh Nguyen

,

,

,

Sreevenkata Anjani Tishita Godavarthi

,

Chase Rainwater

,

,

,

Duy Minh Ho Nguyen

,

CoRR, May, 2026

CodeGraphVLP: Code-as-Planner Meets Semantic-Graph State for Non-Markovian Vision-Language-Action Models.

[DOI]

,

,

,

,

,

Bui Duy Quoc Nghi

,

,

Anthony Gunderman

,

Chase Rainwater

,

,

CoRR, April, 2026

SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection.

[DOI]

,

,

,

,

Gianfranco Doretto

,

,

,

CoRR, April, 2026

Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective.

[DOI]

,

,

,

,

Frederick Bumgarner

,

Duy Minh Ho Nguyen

,

,

,

Chase Rainwater

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Clutter-Resistant Vision-Language-Action Models through Object-Centric and Geometry Grounding.

[DOI]

,

,

,

Trong-Thang Pham

,

,

,

Duy Nguyen Ho Minh

,

,

Anthony Gunderman

,

Chase Rainwater

,

CoRR, December, 2025

SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation.

[DOI]

,

,

,

,

,

Anthony Gunderman

,

Duy Nguyen Ho Minh

,

,

,

,

Chase Rainwater

,

,

CoRR, November, 2025

CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling.

[DOI]

Trong-Thang Pham

,

,

,

Esteban Duran Marti

,

Tien-Phat Nguyen

,

,

,

Ngoc Son Nguyen

,

,

,

Anh Totti Nguyen

,

,

,

,

Hien Van Nguyen

,

CoRR, July, 2025

CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling.

[DOI]

Trong-Thang Pham

,

,

,

Esteban Duran Marti

,

Tien-Phat Nguyen

,

,

,

,

,

,

Anh Totti Nguyen

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2024

Amodal Instance Segmentation with Diffusion Shape Prior Estimation.

[DOI]

,

,

Nguyen Thanh Binh

,

CoRR, 2024

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model.

[DOI]

,

,

,

,

CoRR, 2024

ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection.

[DOI]

,

,

,

Gianfranco Doretto

,

Donald A. Adjeroh

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation.

[DOI]

,

Winston Bounsavy

,

,

,

,

Proceedings of the International Joint Conference on Neural Networks, 2024

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation.

[DOI]

,

,

,

,

,

Gianfranco Doretto

,

,

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Amodal Instance Segmentation with Diffusion Shape Prior Estimation.

[DOI]

,

,

,

Proceedings of the Computer Vision - ACCV 2024, 2024

2023

AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation.

[DOI]

Viet-Khoa Vo-Ho

,

,

,

,

Minh-Triet Tran

,

Int. J. Comput. Vis., 2023

CLIP-TSA: Clip-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection.

[DOI]

Hyekang Kevin Joo

,

,

,

Proceedings of the IEEE International Conference on Image Processing, 2023

DNA: Deformable Neural Articulations Network for Template-free Dynamic 3D Human Reconstruction from Monocular RGB-D Video.

[DOI]

,

Trong-Thang Pham

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning.

[DOI]

,

,

Quang Sang Truong

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning.

[DOI]

,

,

,

,

CoRR, 2022

Meta-Learning of NAS for Few-shot Learning in Medical Image Applications.

[DOI]

Viet-Khoa Vo-Ho

,

,

,

Minh-Triet Tran

,

CoRR, 2022

CapsNet for Medical Image Segmentation.

[DOI]

,

Viet-Khoa Vo-Ho

,

,

,

,

CoRR, 2022

3DConvCaps: 3DUnet with Convolutional Capsule Encoder for Medical Image Segmentation.

[DOI]

,

Viet-Khoa Vo-Ho

,

Proceedings of the 26th International Conference on Pattern Recognition, 2022

VLCAP: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning.

[DOI]

,

,

Viet-Khoa Vo-Ho

,

,

Chase Rainwater

,

,

Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

AISFormer: Amodal Instance Segmentation with Transformer.

[DOI]

,

,

,

Arthur A. F. Fernandes

,

,

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Contextual Explainable Video Representation: Human Perception-based Understanding.

[DOI]

,

,

Phong X. Nguyen

,

,

,

Proceedings of the 56th Asilomar Conference on Signals, Systems, and Computers, ACSSC 2022, Pacific Grove, CA, USA, October 31, 2022

2021

ABN: Agent-Aware Boundary Networks for Temporal Action Proposal Generation.

[DOI]

,

,

,

Minh-Triet Tran

,

Akihiro Sugimoto

,

IEEE Access, 2021

Agent-Environment Network for Temporal Action Proposal Generation.

[DOI]

Viet-Khoa Vo-Ho

,

,

,

Akihiro Sugimoto

,

Minh-Triet Tran

Proceedings of the IEEE International Conference on Acoustics, 2021

Offboard 3D Object Detection From Point Cloud Sequences.

[DOI]

,

,

,

,

,

,

Dragomir Anguelov

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation.

[DOI]

,

,

,

,

,

Minh-Triet Tran

,

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

FIRST - Flexible Interactive Retrieval SysTem for Visual Lifelog Exploration at LSC 2020.

[DOI]

Minh-Triet Tran

,

Thanh-An Nguyen

,

Quoc-Cuong Tran

,

,

,

,

,

Hoang-Phuc Trang-Trung

,

,

Hai-Dang Nguyen

,

,

Viet-Khoa Vo-Ho

,

Proceedings of the Third ACM Workshop on Lifelog Search Challenge, 2020

iTASK - Intelligent Traffic Analysis Software Kit.

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

HCMUS at the NTCIR-14 Lifelog-3 Task.

[DOI]

Nguyen-Khang Le

,

Dieu-Hien Nguyen

,

Trung-Hieu Hoang

,

Thanh-An Nguyen

,

Thanh-Dat Truong

,

,

,

Viet-Khoa Vo-Ho

,

Vinh-Tiep Nguyen

,

Minh-Triet Tran

Proceedings of the 14th NTCIR Conference on Evaluation of Information Access Technologies, 2019

Smart Lifelog Retrieval System with Habit-based Concepts and Moment Visualization.

[DOI]

Nguyen-Khang Le

,

Dieu-Hien Nguyen

,

Trung-Hieu Hoang

,

Thanh-An Nguyen

,

Thanh-Dat Truong

,

,

,

Viet-Khoa Vo-Ho

,

Vinh-Tiep Nguyen

,

Minh-Triet Tran

Proceedings of the ACM Workshop on Lifelog Search Challenge, 2019

Vehicle Re-identification with Learned Representation and Spatial Verification and Abnormality Detection with Multi-Adaptive Vehicle Detectors for Traffic Video Analysis.

[DOI]

Khac-Tuan Nguyen

,

Trung-Hieu Hoang

,

Minh-Triet Tran

,

,

,

,

Viet-Khoa Vo-Ho

,

,

,

Thanh-An Nguyen

,

Thanh-Dat Truong

,

Vinh-Tiep Nguyen

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

2018

Personal Diary Generation from Wearable Cameras with Concept Augmented Image Captioning and Wide Trail Strategy.

[DOI]

Viet-Khoa Vo-Ho

,

,

,

,

Minh-Triet Tran

Proceedings of the Ninth International Symposium on Information and Communication Technology, 2018

Lifelog Moment Retrieval with Visual Concept Fusion and Text-based Query Expansion.

[DOI]

Minh-Triet Tran

,

Thanh-Dat Truong

,

,

Viet-Khoa Vo-Ho

,

,

Vinh-Tiep Nguyen

Proceedings of the Working Notes of CLEF 2018, 2018

Loading...