Zhifu Gao

Orcid: 0009-0008-5691-7324

According to our database¹, Zhifu Gao authored at least 33 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation.

[BibT_eX]

[DOI]

CoRR, May, 2026

Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, April, 2026

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., January, 2026

2025

Explore the Reinforcement Learning for the LLM based ASR and TTS system.

[BibT_eX]

[DOI]

CoRR, September, 2025

FunAudio-ASR Technical Report.

[BibT_eX]

[DOI]

CoRR, September, 2025

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training.

[BibT_eX]

[DOI]

CoRR, May, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.

[BibT_eX]

[DOI]

CoRR, April, 2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction.

[BibT_eX]

[DOI]

CoRR, January, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition.

[BibT_eX]

[DOI]

CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.

[BibT_eX]

[DOI]

CoRR, 2024

CTC-Assisted LLM-Based Contextual ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

MaLa-ASR: Multimedia-Assisted LLM-Based ASR.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.

[BibT_eX]

[DOI]

CoRR, 2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

CoRR, 2023

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Extremely Low Footprint End-to-End ASR System for Smart Device.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020

Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model.

[BibT_eX]

[DOI]

CoRR, 2020

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

An Effective Deep Embedding Learning Architecture for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

An Improved Deep Embedding Learning Method for Short Duration Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Zhifu Gao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...