Chao Zhang

Alexandra Woolgar

NeuroImage, 2025

Knowledge-aware audio-grounded generative slot filling for limited annotated data.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2025

A robust adaptive meta-sample generation method for few-shot time series prediction.

[BibT_eX]

[DOI]

Complex Intell. Syst., 2025

The 1st SpeechWellness Challenge: Detecting Suicide Risk Among Adolescents.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Improving LLM Video Understanding with 16 Frames Per Second.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Bayesian WeakS-to-Strong from Text Classification to Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Audio-centric Video Understanding Benchmark without Text Shortcut.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

DNCASR: End-to-End Training for Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Xianrui Zheng

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Graph Neural Networks for Contextual ASR With the Tree-Constrained Pointer Generator.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Cross-Utterance Conditioned VAE for Speech Generation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events.

[BibT_eX]

[DOI]

CoRR, 2024

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation.

[BibT_eX]

[DOI]

CoRR, 2024

Speaker Adaptation for Quantised End-to-End ASR Models.

[BibT_eX]

[DOI]

CoRR, 2024

Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2024

CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2024

M<sup>3</sup>AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset.

[BibT_eX]

[DOI]

CoRR, 2024

SWIM: Short-Window CNN Integrated With Mamba for EEG-Based Auditory Spatial Attention Decoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines For Speech Recognition, Speaker Tagging, and Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Automatic Time Alignment Generation For End-to-End ASR Using Acoustic Probability Modelling.

[BibT_eX]

[DOI]

Dongcheng Jiang

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Hierarchical Multi-Path and Multi-Model Selection For Fake Speech Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Affect Recognition in Conversations Using Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2024

An Improved Empirical Fisher Approximation for Natural Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Speaker Diarization for Unlimited Number of Speakers Using Dynamic Linear.

[BibT_eX]

[DOI]

Siyin Wang

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

SOT Triggered Neural Clustering for Speaker Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Can Large Language Models Understand Spatial Audio?

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SALMONN: Towards Generic Hearing Abilities for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Enhancing Quantised End-to-End ASR Models Via Personalisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Connecting Speech Encoder and Large Language Model for ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Can Whisper Perform Speech-Based In-Context Learning?

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Extending Large Language Models for Speech and Audio Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Bridging the Gap: Integrating Pre-Trained Speech Enhancement and Recognition Models for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 32nd European Signal Processing Conference, 2024

Bayesian Example Selection Improves In-Context Learning for Speech, Text and Visual Modalities.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Modelling Variability in Human Annotator Simulation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Speech-based Slot Filling using Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

M³AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring.

[BibT_eX]

[DOI]

Qiujia Li

Speech Commun., February, 2023

Prosody Modelling With Pre-Trained Cross-Utterance Representations for Improved Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Estimating the Uncertainty in Emotion Class Labels With Utterance-Specific Dirichlet Priors.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Conditional Diffusion Model for Target Speaker Extraction.

[BibT_eX]

[DOI]

CoRR, 2023

It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation.

[BibT_eX]

[DOI]

CoRR, 2023

Affect Recognition in Conversations Using Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

Obstructive Sleep Apnea Detection using Pre-trained Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Neural Time Alignment Module for End-to-End Automatic Speech Recognition.

[BibT_eX]

[DOI]

Dongcheng Jiang

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

UML: A Universal Monolingual Output Layer For Multilingual Asr.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Supervised Representations in Speech-Based Depression Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

End-to-End Spoken Language Understanding with Tree-Constrained Pointer Generator.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Speech Enhancement Using Audio Tagging Knowledge From Pre-Trained Representations and Multi-Task Learning.

[BibT_eX]

[DOI]

Shaoxiong Lin

Yanmin Qian

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Transferring Speech-Generic and Depression-Specific Knowledge for Alzheimer's Disease Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression.

[BibT_eX]

[DOI]

William D. Marslen-Wilson

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

On the similarities of representations in artificial and brain neural networks for speech recognition.

[BibT_eX]

[DOI]

Li Su

Frontiers Comput. Neurosci., 2022

Distribution-Based Emotion Recognition in Conversation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription.

[BibT_eX]

[DOI]

Xianrui Zheng

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Turn-Taking Prediction for Natural Conversational Speech.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving the Fusion of Acoustic and Text Representations in RNN-T.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Combination of deep speaker embeddings for diarisation.

[BibT_eX]

[DOI]

Neural Networks, 2021

A distributed optimisation framework combining natural gradient with Hessian-free for discriminative sequence training.

[BibT_eX]

[DOI]

Neural Networks, 2021

Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Discriminative Neural Clustering for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Variable Frame Rate Acoustic Models Using Minimum Error Reinforcement Learning.

[BibT_eX]

[DOI]

Dongcheng Jiang

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Neural Kalman Filtering for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Improving Prosody Modelling with Cross-Utterance Bert Embeddings for End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Emotion Recognition by Fusing Time Synchronous and Time Asynchronous Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Content-Aware Speaker Embeddings for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Transformer Language Models with LSTM-Based Cross-Utterance Information Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Dian: Duration Informed Auto-Regressive Network for Voice Cloning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition.

[BibT_eX]

[DOI]

Xianrui Zheng

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Tree-Constrained Pointer Generator for End-to-End Contextual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2020

Introduction to the Special Issue on Deep Learning for Multi-Modal Intelligence Across Speech, Language, Vision, and Heterogeneous Signals.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2020

Cross-Utterance Language Models with Acoustic Error Sampling.

[BibT_eX]

[DOI]

CoRR, 2020

Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The JD AI Speaker Verification System for the FFSVC 2020 Challenge.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improved Large-Margin Softmax Loss for Speaker Diarisation.

[BibT_eX]

[DOI]

Yassir Fathullah

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multi-Span Acoustic Modelling Using Raw Waveform Signals.

[BibT_eX]

[DOI]

Patrick von Platen

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

PyHTK: Python Library and ASR Pipelines for HTK.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Speaker Diarisation Using 2D Self-attentive Combination of Embeddings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Integrating Source-Channel and Attention-Based Sequence-to-Sequence Models for Speech Recognition.

[BibT_eX]

[DOI]

Qiujia Li

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Semi-tied Units for Efficient Gating in LSTM and Highway Networks.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

High Order Recurrent Neural Networks for Acoustic Modelling.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Improved Tdnns Using Deep Kernels and Frequency Dependent Grid-RNNS.

[BibT_eX]

[DOI]

Florian L. Kreyssig

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Joint training methods for tandem and hybrid speech recognition systems using deep neural networks

[BibT_eX]

[DOI]

William D. Marslen-Wilson

PhD thesis, 2017

Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem.

[BibT_eX]

[DOI]

PLoS Comput. Biol., 2017

Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

System combination with log-linear models.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Improved DNN-based segmentation for multi-genre broadcast audio.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

A general artificial neural network extension for HTK.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Joint decoding of tandem and hybrid systems for improved keyword spotting on low resource languages.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The Cambridge University 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translation.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Cambridge university transcription systems for the multi-genre broadcast challenge.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The development of the cambridge university alignment systems for the multi-genre broadcast challenge.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Speaker diarisation and longitudinal linking in multi-genre broadcast data.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Structured discriminative models using deep neural-network features.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Standalone training of context-dependent deep neural network acoustic models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Investigation of multilingual deep neural networks for spoken term detection.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Discriminative dynamic Gaussian mixture selection with enhanced robustness and performance for multi-accent speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Reliable accent specific unit generation with dynamic Gaussian mixture selection for multi-accent speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

An In-car Chinese Noise Corpus for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Asian Language Processing, 2011

Detection-based accented speech recognition using articulatory features.

[BibT_eX]

[DOI]