Yifan Peng
Orcid: 0000-0002-8581-8674Affiliations:
- NVIDIA Corporation, Santa Clara, CA, USA
- Carnegie Mellon University, Department of Electrical and Computer Engineering, Pittsburgh, PA, USA (PhD 2025)
According to our database1,
Yifan Peng
authored at least 52 papers
between 2022 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on linkedin.com
-
on twitter.com
-
on orcid.org
-
on github.com
On csauthors.net:
Bibliography
2025
DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition.
CoRR, June, 2025
CoRR, June, 2025
CoRR, May, 2025
CoRR, April, 2025
CoRR, February, 2025
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders.
CoRR, 2024
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation.
CoRR, 2024
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis.
CoRR, 2024
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search.
Proceedings of the IEEE International Conference on Acoustics, 2024
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2024
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2024
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network.
CoRR, 2023
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
I3D: Transformer Architectures with Input-Dependent Dynamic Depth for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023
2022
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the 19th International Conference on Spoken Language Translation, 2022
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding.
Proceedings of the International Conference on Machine Learning, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022