Guangzhi Sun

IEEE ACM Trans. Audio Speech Lang. Process., 2024

M<sup>3</sup>AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset.

[BibT_eX]

[DOI]

CoRR, 2024

Large language models surpass human experts in predicting neuroscience results.

[BibT_eX]

[DOI]

CoRR, 2024

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation.

[BibT_eX]

[DOI]

Nineli Lashkarashvili

Wen Wu

CoRR, 2024

2023

Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speech-based Slot Filling using Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.

[BibT_eX]

[DOI]

CoRR, 2023

SALMONN: Towards Generic Hearing Abilities for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Conditional Diffusion Model for Target Speaker Extraction.

[BibT_eX]

[DOI]

CoRR, 2023

Connecting Speech Encoder and Large Language Model for ASR.

[BibT_eX]

[DOI]

CoRR, 2023

Affect Recognition in Conversations Using Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Enhancing Quantised End-to-End ASR Models via Personalisation.

[BibT_eX]

[DOI]

CoRR, 2023

Cross-Utterance Conditioned VAE for Speech Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data.

[BibT_eX]

[DOI]

CoRR, 2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

End-to-End Spoken Language Understanding with Tree-Constrained Pointer Generator.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Combination of deep speaker embeddings for diarisation.

[BibT_eX]

[DOI]

Neural Networks, 2021

Content-Aware Speaker Embeddings for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Transformer Language Models with LSTM-Based Cross-Utterance Information Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Tree-Constrained Pointer Generator for End-to-End Contextual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Cross-Utterance Language Models with Acoustic Error Sampling.

[BibT_eX]

[DOI]

CoRR, 2020

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior.

[BibT_eX]

[DOI]

CoRR, 2020

Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Speaker Diarisation Using 2D Self-attentive Combination of Embeddings.

[BibT_eX]

[DOI]