Peihao Chen

Orcid: 0000-0002-6847-1621

According to our database¹, Peihao Chen authored at least 38 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

NaVLA$^2$: A Vision-Language-Audio-Action Model for Multimodal Instruction Navigation.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

When to Align: Dynamic Behavior Consistency for Multiagent Systems via Intrinsic Rewards.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., December, 2025

SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-World Object Detector.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2025

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model.

[BibT_eX]

[DOI]

CoRR, June, 2025

Source-Free Elastic Model Adaptation for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

Learning 3D Persistent Embodied World Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Map-Guided Few-Shot Audio-Visual Acoustics Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2024

SnapMem: Snapshot-based 3D Scene Memory for Embodied Exploration and Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

CoNav: A Benchmark for Human-Centered Collaborative Navigation.

[BibT_eX]

[DOI]

CoRR, 2024

MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

3D-VLA: A 3D Vision-Language-Action Generative World Model.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

FlexAttention for Efficient High-Resolution Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

A Simple Knowledge Distillation Framework for Open-world Object Detection.

[BibT_eX]

[DOI]

CoRR, 2023

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

A<sup>2</sup>Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2023

Detecting the open-world objects with the help of the Brain.

[BibT_eX]

[DOI]

CoRR, 2023

FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

3D-LLM: Injecting the 3D World into Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning Vision-and-Language Navigation from YouTube Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Masked Motion Encoding for Self-Supervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

M<sup>3</sup>Video: Masked Motion Modeling for Self-Supervised Video Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Active Camera for Multi-Object Navigation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Relation Attention for Temporal Action Localization.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

Generating Visually Aligned Sound From Videos.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Foley Music: Learning to Generate Music from Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Dense Regression Network for Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Location-Aware Graph Convolutional Networks for Video Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Breaking Winner-Takes-All: Iterative-Winners-Out Networks for Weakly Supervised Temporal Action Localization.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

Self-Supervised Moving Vehicle Tracking With Stereo Sound.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Peihao Chen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...