Fan Yu

Affiliations:
  • Alibaba Group, Speech Lab of DAMO Academy, China


According to our database1, Fan Yu authored at least 32 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training.
CoRR, May, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.
CoRR, April, 2025

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction.
CoRR, January, 2025

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.
CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.
CoRR, 2024

MaLa-ASR: Multimedia-Assisted LLM-Based ASR.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

LCB-Net: Long-Context Biasing for Audio-Visual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

SlideSpeech: A Large Scale Slide-Enriched Audio-Visual Corpus.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
CASA-ASR: Context-Aware Speaker-Attributed ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge.
CoRR, 2022

MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario.
CoRR, 2022

MFCCA:Multi-Frame Cross-Channel Attention for Multi-Speaker ASR in Multi-Party Meeting Scenario.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Separate-to-Recognize: Joint Multi-target Speech Separation and Speech Recognition for Speaker-attributed ASR.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit.
CoRR, 2021

The SLT 2021 Children Speech Recognition Challenge: Open Datasets, Rules and Baselines.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods.
Proceedings of the IEEE International Conference on Acoustics, 2021

Boundary and Context Aware Training for CIF-Based Non-Autoregressive End-to-End ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition.
CoRR, 2020


  Loading...