Guangzhi Sun

Orcid: 0000-0002-5886-056X

According to our database1, Guangzhi Sun authored at least 30 papers between 2019 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
M<sup>3</sup>AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset.
CoRR, 2024

Large language models surpass human experts in predicting neuroscience results.
CoRR, 2024

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation.
CoRR, 2024

2023
Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speech-based Slot Filling using Large Language Models.
CoRR, 2023

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.
CoRR, 2023

SALMONN: Towards Generic Hearing Abilities for Large Language Models.
CoRR, 2023

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.
CoRR, 2023

Conditional Diffusion Model for Target Speaker Extraction.
CoRR, 2023

Connecting Speech Encoder and Large Language Model for ASR.
CoRR, 2023

Affect Recognition in Conversations Using Large Language Models.
CoRR, 2023

Enhancing Quantised End-to-End ASR Models via Personalisation.
CoRR, 2023

Cross-Utterance Conditioned VAE for Speech Generation.
CoRR, 2023

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data.
CoRR, 2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?
CoRR, 2023

Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator.
CoRR, 2023

End-to-End Spoken Language Understanding with Tree-Constrained Pointer Generator.
Proceedings of the IEEE International Conference on Acoustics, 2023

Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation.
Proceedings of the IEEE International Conference on Acoustics, 2023

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition.
Proceedings of the Interspeech 2022, 2022

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Combination of deep speaker embeddings for diarisation.
Neural Networks, 2021

Content-Aware Speaker Embeddings for Speaker Diarisation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Transformer Language Models with LSTM-Based Cross-Utterance Information Representation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Tree-Constrained Pointer Generator for End-to-End Contextual Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Cross-Utterance Language Models with Acoustic Error Sampling.
CoRR, 2020

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior.
CoRR, 2020

Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Speaker Diarisation Using 2D Self-attentive Combination of Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2019


  Loading...