Peiwen Sun

Orcid: 0009-0005-3016-8554

According to our database1, Peiwen Sun authored at least 22 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling.
CoRR, April, 2026

AURA: Always-On Understanding and Real-Time Assistance via Video Streams.
CoRR, April, 2026

PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios.
CoRR, January, 2026

2025
OneThinker: All-in-one Reasoning Model for Image and Video.
CoRR, December, 2025

PrismAudio: Decomposed Chain-of-Thoughts and Multi-dimensional Rewards for Video-to-Audio Generation.
CoRR, November, 2025

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km.
CoRR, October, 2025

OmniAudio: Generating Spatial Audio from 360-Degree Video.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
FusionINN: Invertible Image Fusion for Brain Tumor Monitoring.
CoRR, 2024

FusionINN: Decomposable Image Fusion for Brain Tumor Monitoring.
Proceedings of the Trustworthy Artificial Intelligence for Healthcare, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Unveiling and Mitigating Bias in Audio Visual Segmentation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Enhancing Few-shot Classification through Token Selection for Balanced Learning.
Proceedings of the International Joint Conference on Neural Networks, 2024

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes.
Proceedings of the Computer Vision - ECCV 2024, 2024

Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Proceedings of the Computer Vision - ECCV 2024, 2024

Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
More than Vanilla Fusion: a Simple, Decoupling-free, Attention Module for Multimodal Fusion Based on Signal Theory.
CoRR, 2023

Predicting Central Cervical Lymph Node Metastasis of Papillary Thyroid Carcinomas Using Multi-view Ultrasound Images.
Proceedings of 2023 International Conference on Medical Imaging and Computer-Aided Diagnosis, 2023

A Method of Audio-Visual Person Verification by Mining Connections between Time Series.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022
Learning Audio-Visual embedding for Wild Person Verification.
CoRR, 2022

2019
A New Type of ROS-Based Pedagogical Robot for Kids' Mathematics Education.
Proceedings of the 2019 IEEE International Conference on Electro Information Technology, 2019


  Loading...