Puyuan Peng

Orcid: 0009-0009-6866-2063

According to our database¹, Puyuan Peng authored at least 32 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset.

[BibT_eX]

[DOI]

CoRR, September, 2025

TalkLess: Blending Extractive and Abstractive Speech Summarization for Editing Speech to Preserve Content and Style.

[BibT_eX]

[DOI]

Karim Benharrak

Puyuan Peng

Amy Pavel

CoRR, July, 2025

VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation.

[BibT_eX]

[DOI]

CoRR, May, 2025

Temporally Streaming Audio-Visual Synchronization for Real-World Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

TalkLess: Blending Extractive and Abstractive Summarization for Editing Speech to Preserve Content and Style.

[BibT_eX]

[DOI]

Karim Benharrak

Puyuan Peng

Amy Pavel

Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, 2025

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.

[BibT_eX]

[DOI]

Fabian Alejandro Ritter Gutierrez

et al.

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SyllableLM: Learning Coarse Semantic Units for Speech Language Models.

[BibT_eX]

[DOI]

Alan Baade

Puyuan Peng

David Harwath

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.

[BibT_eX]

[DOI]

Fabian Ritter Gutierrez

CoRR, 2024

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild.

[BibT_eX]

[DOI]

CoRR, 2024

Neural Codec Language Models for Disentangled and Textless Voice Conversion.

[BibT_eX]

[DOI]

Alan Baade

Puyuan Peng

David Harwath

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

BAT: Learning to Reason about Spatial Sounds with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SpeechCLIP+: Self-Supervised Multi-Task Representation Learning for Speech Via Clip and Speech-Image Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.

[BibT_eX]

[DOI]

CoRR, 2023

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode.

[BibT_eX]

[DOI]

CoRR, 2023

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Audio-Visual Neural Syntax Acquisition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling.

[BibT_eX]

[DOI]

Puyuan Peng

David Harwath

CoRR, 2022

Zero-shot Video Moment Retrieval With Off-the-Shelf Models.

[BibT_eX]

[DOI]

Anuj Diwan

Puyuan Peng

Raymond J. Mooney

Proceedings of the Transfer Learning for Natural Language Processing Workshop, 2022

Word Discovery in Visually Grounded, Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Puyuan Peng

David Harwath

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MAE-AST: Masked Autoencoding Audio Spectrogram Transformer.

[BibT_eX]

[DOI]

Alan Baade

Puyuan Peng

David Harwath

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Fast-Slow Transformer for Visually Grounding Speech.

[BibT_eX]

[DOI]

Puyuan Peng

David Harwath

Proceedings of the IEEE International Conference on Acoustics, 2022

2020

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Puyuan Peng

Herman Kamper

Karen Livescu

CoRR, 2020

Puyuan Peng

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...