Wenyi Yu

Orcid: 0000-0002-8693-8168

According to our database1, Wenyi Yu authored at least 21 papers between 2016 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation.
CoRR, May, 2025

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation.
CoRR, 2024

Adaptive Conditional Expert Selection Network for Multi-domain Recommendation.
CoRR, 2024

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation.
CoRR, 2024

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement.
CoRR, 2024

HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction.
CoRR, 2024

M<sup>3</sup>AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset.
CoRR, 2024

An Improved Empirical Fisher Approximation for Natural Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

An Optimizer for Conformer Based on Conjugate Gradient Method.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Can Large Language Models Understand Spatial Audio?
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SALMONN: Towards Generic Hearing Abilities for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Connecting Speech Encoder and Large Language Model for ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

Extending Large Language Models for Speech and Audio Captioning.
Proceedings of the IEEE International Conference on Acoustics, 2024

M³AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
T2T-YAO: A Telomere-to-Telomere Assembled Diploid Reference Genome for Han Chinese.
Genom. Proteom. Bioinform., 2023

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.
CoRR, 2023

2022
A method of band selection of remote sensing image based on clustering and intra-class index.
Multim. Tools Appl., 2022

2016
Superpixel-Based CFAR Target Detection for High-Resolution SAR Images.
IEEE Geosci. Remote. Sens. Lett., 2016


  Loading...