Zhihao Du

Sensors, July, 2024

Effects of High-Load Bench Press Training with Different Blood Flow Restriction Pressurization Strategies on the Degree of Muscle Activation in the Upper Limbs of Bodybuilders.

[BibT_eX]

[DOI]

Sensors, January, 2024

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities.

[BibT_eX]

[DOI]

CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.

[BibT_eX]

[DOI]

CoRR, 2024

Personality-memory Gated Adaptation: An Efficient Speaker Adaptation for Personalized End-to-end Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

FunCodec: A Fundamental, Reproducible and Integrable Open-Source Toolkit for Neural Speech Codec.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.

[BibT_eX]

[DOI]

CoRR, 2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

CoRR, 2023

CASA-ASR: Context-Aware Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AttenTPU: Tensor Processor for Attention Mechanism with Fine-Grained Padding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Integrated Circuits, 2023

TOLD: a Novel Two-Stage Overlap-Aware Framework for Speaker Diarization.

[BibT_eX]

[DOI]

Jiaming Wang

Shiliang Zhang

Proceedings of the IEEE International Conference on Acoustics, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario.

[BibT_eX]

[DOI]

CoRR, 2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios.

[BibT_eX]

[DOI]

CoRR, 2022

MFCCA:Multi-Frame Cross-Channel Attention for Multi-Speaker ASR in Multi-Party Meeting Scenario.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Separate-to-Recognize: Joint Multi-target Speech Separation and Speech Recognition for Speaker-attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information.

[BibT_eX]

[DOI]

CoRR, 2021

Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Efficient Joint Training Framework for Robust Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 27th International Conference, 2020

Pan: Phoneme-Aware Network for Monaural Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

CoRR, 2019

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of Monaural Front-End Processing for Robust Speech Recognition Without Retraining or Joint-Training.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Investigation of Monaural Front-End Processing for Robust ASR without Retraining or Joint-Training.

[BibT_eX]

[DOI]