Chao Zhang

Orcid: 0000-0002-7730-5131

Affiliations:
  • Tsinghua University, partment of Electronic Engineering, Beijing, China
  • University of Cambridge, Department of Engineering, UK (PhD 2017)


According to our database1, Chao Zhang authored at least 72 papers between 2013 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring.
Speech Commun., February, 2023

Prosody Modelling With Pre-Trained Cross-Utterance Representations for Improved Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Estimating the Uncertainty in Emotion Class Labels With Utterance-Specific Dirichlet Priors.
IEEE Trans. Affect. Comput., 2023

Speech-based Slot Filling using Large Language Models.
CoRR, 2023

SALMONN: Towards Generic Hearing Abilities for Large Language Models.
CoRR, 2023

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.
CoRR, 2023

Conditional Diffusion Model for Target Speaker Extraction.
CoRR, 2023

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection.
CoRR, 2023

It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation.
CoRR, 2023

Connecting Speech Encoder and Large Language Model for ASR.
CoRR, 2023

Affect Recognition in Conversations Using Large Language Models.
CoRR, 2023

Enhancing Quantised End-to-End ASR Models via Personalisation.
CoRR, 2023

Cross-Utterance Conditioned VAE for Speech Generation.
CoRR, 2023

Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations.
CoRR, 2023

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data.
CoRR, 2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?
CoRR, 2023

Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator.
CoRR, 2023

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition.
CoRR, 2023

Self-Supervised Representations in Speech-Based Depression Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

End-to-End Spoken Language Understanding with Tree-Constrained Pointer Generator.
Proceedings of the IEEE International Conference on Acoustics, 2023

Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Transferring Speech-Generic and Depression-Specific Knowledge for Alzheimer's Disease Detection.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
On the similarities of representations in artificial and brain neural networks for speech recognition.
Frontiers Comput. Neurosci., 2022

Distribution-Based Emotion Recognition in Conversation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription.
Proceedings of the Interspeech 2022, 2022

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition.
Proceedings of the Interspeech 2022, 2022

2021
Combination of deep speaker embeddings for diarisation.
Neural Networks, 2021

A distributed optimisation framework combining natural gradient with Hessian-free for discriminative sequence training.
Neural Networks, 2021

Discriminative Neural Clustering for Speaker Diarisation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Variable Frame Rate Acoustic Models Using Minimum Error Reinforcement Learning.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Neural Kalman Filtering for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2021

Emotion Recognition by Fusing Time Synchronous and Time Asynchronous Representations.
Proceedings of the IEEE International Conference on Acoustics, 2021

Content-Aware Speaker Embeddings for Speaker Diarisation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Transformer Language Models with LSTM-Based Cross-Utterance Information Representation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Tree-Constrained Pointer Generator for End-to-End Contextual Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications.
IEEE J. Sel. Top. Signal Process., 2020

Introduction to the Special Issue on Deep Learning for Multi-Modal Intelligence Across Speech, Language, Vision, and Heterogeneous Signals.
IEEE J. Sel. Top. Signal Process., 2020

Cross-Utterance Language Models with Acoustic Error Sampling.
CoRR, 2020

Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning.
Proceedings of the Interspeech 2020, 2020

The JD AI Speaker Verification System for the FFSVC 2020 Challenge.
Proceedings of the Interspeech 2020, 2020

Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed.
Proceedings of the Interspeech 2020, 2020

Improved Large-Margin Softmax Loss for Speaker Diarisation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation.
Proceedings of the Interspeech 2019, 2019

Multi-Span Acoustic Modelling Using Raw Waveform Signals.
Proceedings of the Interspeech 2019, 2019

PyHTK: Python Library and ASR Pipelines for HTK.
Proceedings of the IEEE International Conference on Acoustics, 2019

Speaker Diarisation Using 2D Self-attentive Combination of Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2019

Integrating Source-Channel and Attention-Based Sequence-to-Sequence Models for Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Semi-tied Units for Efficient Gating in LSTM and Highway Networks.
Proceedings of the Interspeech 2018, 2018

Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems.
Proceedings of the Interspeech 2018, 2018

High Order Recurrent Neural Networks for Acoustic Modelling.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Improved Tdnns Using Deep Kernels and Frequency Dependent Grid-RNNS.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Joint training methods for tandem and hybrid speech recognition systems using deep neural networks
PhD thesis, 2017

Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem.
PLoS Comput. Biol., 2017

Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems.
Proceedings of the Interspeech 2016, 2016

DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

System combination with log-linear models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Improved DNN-based segmentation for multi-genre broadcast audio.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
A general artificial neural network extension for HTK.
Proceedings of the INTERSPEECH 2015, 2015

Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling.
Proceedings of the INTERSPEECH 2015, 2015

Joint decoding of tandem and hybrid systems for improved keyword spotting on low resource languages.
Proceedings of the INTERSPEECH 2015, 2015

The Cambridge University 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translation.
Proceedings of the INTERSPEECH 2015, 2015

Cambridge university transcription systems for the multi-genre broadcast challenge.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The development of the cambridge university alignment systems for the multi-genre broadcast challenge.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Speaker diarisation and longitudinal linking in multi-genre broadcast data.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Structured discriminative models using deep neural-network features.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Standalone training of context-dependent deep neural network acoustic models.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition.
IEEE Trans. Speech Audio Process., 2013

Investigation of multilingual deep neural networks for spoken term detection.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013


  Loading...