Fan Yu

Affiliations:

Alibaba Group, Speech Lab of DAMO Academy, China

According to our database¹, Fan Yu authored at least 32 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training.

[BibT_eX]

[DOI]

CoRR, May, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.

[BibT_eX]

[DOI]

CoRR, April, 2025

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction.

[BibT_eX]

[DOI]

CoRR, January, 2025

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.

[BibT_eX]

[DOI]

CoRR, 2024

MaLa-ASR: Multimedia-Assisted LLM-Based ASR.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

LCB-Net: Long-Context Biasing for Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

SlideSpeech: A Large Scale Slide-Enriched Audio-Visual Corpus.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

CASA-ASR: Context-Aware Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario.

[BibT_eX]

[DOI]

CoRR, 2022

MFCCA:Multi-Frame Cross-Channel Attention for Multi-Speaker ASR in Multi-Party Meeting Scenario.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Separate-to-Recognize: Joint Multi-target Speech Separation and Speech Recognition for Speaker-attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

CoRR, 2021

The SLT 2021 Children Speech Recognition Challenge: Open Datasets, Rules and Baselines.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Boundary and Context Aware Training for CIF-Based Non-Autoregressive End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Fan Yu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...