Tan Lee

Brain Informatics, December, 2025

PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025.

[BibT_eX]

[DOI]

CoRR, July, 2025

Probing Speaker-specific Features in Speaker Representations.

[BibT_eX]

[DOI]

CoRR, January, 2025

PodAgent: A Comprehensive Framework for Podcast Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Automatic Detection of Speech Sound Disorder in Cantonese-Speaking Pre-School Children.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

User-Driven Voice Generation and Editing through Latent Space Navigation.

[BibT_eX]

[DOI]

Junbin Liu

CoRR, 2024

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

Contrastive Context-Speech Pretraining for Expressive Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems.

[BibT_eX]

[DOI]

Aemon Yat Fei Chiu

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Learning Representation of Therapist Empathy in Counseling Conversation Using Siamese Hierarchical Attention Network.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

LUPET: Incorporating Hierarchical Information Path into Multilingual ASR.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Parameter-efficient Language Extension Framework for Multilingual ASR.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Modeling Intrapersonal and Interpersonal Influences for Automatic Estimation of Therapist Empathy in Counseling Conversation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Sparsely Shared Lora on Whisper for Child Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Efficient Black-Box Speaker Verification Model Adaptation With Reprogramming And Backend Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models.

[BibT_eX]

[DOI]

Guangyan Zhang

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Study on Using Duration and Formant Features in Automatic Detection of Speech Sound Disorder in Children.

[BibT_eX]

[DOI]

Cymie Wing-Yee Ng

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CoMFLP: Correlation Measure Based Fast Search on ASR Layer Pruning.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Model Compression for DNN-based Speaker Verification Using Weight Quantization.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Covariance Regularization for Probabilistic Linear Discriminant Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

An ASR-Free Fluency Scoring Approach with Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Phone-Level Linguistic-Acoustic Similarity For Utterance-Level Pronunciation Scoring.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Functional Connectivity Analysis in Multi-channel EEG for Emotion Detection with Phase Locking Value and 3D CNN.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Enhancing Segment-Based Speech Emotion Recognition by Iterative Self-Learning.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

iExam: A Novel Online Exam Monitoring and Analysis System Based on Face Detection and Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech.

[BibT_eX]

[DOI]

CoRR, 2022

CUEMPATHY: A Counseling Speech Dataset for Psychotherapy Research.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Aphasia Detection for Cantonese-Speaking and Mandarin-Speaking Patients Using Pre-Trained Language Models.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Transport-Oriented Feature Aggregation for Speaker Embedding Learning.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Hierarchical Attention Network for Evaluating Therapist Empathy in Counseling Session.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Characterizing Therapist's Speaking Style in Relation to Empathy in Psychotherapy.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Environment Aware Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Daxin Tan

Guangyan Zhang

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Unifying Cosine and PLDA Back-ends for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Durational Patterning at Discourse Boundaries in Relation to Therapist Empathy in Psychotherapy.

[BibT_eX]

[DOI]

Nicolette Wing Tung Lee

Koonkan Fung

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Multivariate Empirical Mode Decomposition of EEG for Mental State Detection at Localized Brain Lobes.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

MEMD-HHT based Emotion Detection from EEG using 3D CNN.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

2021

Bayesian Learning for Deep Neural Network Adaptation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Acoustical Analysis of Speech Under Physical Stress in Relation to Physical Activities and Physical Literacy.

[BibT_eX]

[DOI]

CoRR, 2021

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Robust Feature Learning on Long-Duration Sounds for Acoustic Scene Classification.

[BibT_eX]

[DOI]

CoRR, 2021

Enhancing Segment-Based Speech Emotion Recognition by Deep Self-Learning.

[BibT_eX]

[DOI]

CoRR, 2021

CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge.

[BibT_eX]

[DOI]

CoRR, 2021

Low-Resource NMT: A Case Study on the Written and Spoken Languages in Hong Kong.

[BibT_eX]

[DOI]

Hei Yi Mak

Proceedings of the NLPIR 2021: 5th International Conference on Natural Language Processing and Information Retrieval, Sanya, China, December 17, 2021

Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Automatic Detection of Word-Level Reading Errors in Non-native English Speech Based on ASR Output.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Applying the Information Bottleneck Principle to Prosodic Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Fine-Grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement.

[BibT_eX]

[DOI]

Daxin Tan

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Pairing Weak with Strong: Twin Models for Defending Against Adversarial Attack on Speaker Verification.

[BibT_eX]

[DOI]

Xu Li

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Detection of Consonant Errors in Disordered Speech Based on Consonant-Vowel Segment Embedding.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Utterance-Level Neural Confidence Measure for End-to-End Children Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Improving Text-Independent Speaker Verification with Auxiliary Speakers Using Graph.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2020

Automatic Assessment of Speech Impairment in Cantonese-Speaking People with Aphasia.

[BibT_eX]

[DOI]

Juan Ignacio Godino-Llorente

IEEE J. Sel. Top. Signal Process., 2020

Introduction to the Issue on Automatic Assessment of Health Disorders Based on Voice, Speech, and Language Processing.

[BibT_eX]

[DOI]

Douglas D. O'Shaughnessy

Najim Dehak

Claudia Manfredi

IEEE J. Sel. Top. Signal Process., 2020

Unsupervised Spoken Term Discovery Based on Re-clustering of Hypothesized Speech Segments with Siamese and Triplet Networks.

[BibT_eX]

[DOI]

Man-Ling Sung

CoRR, 2020

The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2020

Fine-grained style modelling and transfer in text-to-speech synthesis via content-style disentanglement.

[BibT_eX]

[DOI]

Daxin Tan

CoRR, 2020

Learning Syllable-Level Discrete Prosodic Representation for Expressive Speech Generation.

[BibT_eX]

[DOI]

Guangyan Zhang

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment.

[BibT_eX]

[DOI]

Kathy Yuet-Sheung Lee

Michael Chi-Fai Tong

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Automatic Detection of Phonological Errors in Child Speech Using Siamese Recurrent Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

EigenEmo: Spectral Utterance Representation Using Dynamic Mode Decomposition for Speech Emotion Classification.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Emotion Profile Refinery for Speech Emotion Classification.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Text-Independent Speaker Verification with Dual Attention Network.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Time-Frequency Feature Decomposition Based on Sound Duration for Acoustic Scene Classification.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Mixture Factorized Auto-Encoder for Unsupervised Hierarchical Deep Factorization of Speech Signal.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Resting-State EEG-Based Biometrics with Signals Features Extracted by Multivariate Empirical Mode Decomposition.

[BibT_eX]

[DOI]

Matthew King-Hang Ma

Manson Cheuk-Man Fong

William Shi-Yuan Wang

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Searching for Efficient Network Architectures for Acoustic Scene Classification.

[BibT_eX]

[DOI]

Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

2019

Acoustical Assessment of Voice Disorder With Continuous Speech Using ASR Posterior Features.

[BibT_eX]

[DOI]

Yuanyuan Liu

Thomas K. T. Law

Kathy Yuet-Sheung Lee

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

22nd oriental COCOSDA conference region report 2019.

[BibT_eX]

[DOI]

Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019

Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Child Speech Disorder Detection with Siamese Recurrent Network Using Speech Attribute Features.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Automatic Assessment of Language Impairment Based on Raw ASR Output.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Deep Learning of Segment-Level Feature Representation with Multiple Instance Learning for Utterance-Level Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

BLHUC: Bayesian Learning of Hidden Unit Contributions for Deep Neural Network Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Enhancing Sound Texture in CNN-based Acoustic Scene Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Combining Phone Posteriorgrams from Strong and Weak Recognizers for Automatic Speech Assessment of People with Aphasia.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Adversarial Multi-task Deep Features and Unsupervised Back-end Adaptation for Language Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Revisiting Hidden Markov Models for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Guest Editorial: Advances in Deep Learning for Speech Processing.

[BibT_eX]

[DOI]

Lei Xie

Man-Wai Mak

J. Signal Process. Syst., 2018

Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

A Study on Acoustic Modeling for Child Speech Based on Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

An End-to-End Approach to Automatic Speech Assessment for People with Aphasia.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

An Automated Assessment Tool for Child Speech Disorders.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Disordered Speech Assessment Using Kullback-Leibler Divergence Features with Multi-Task Acoustic Modeling.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Prediction of Voice Disorder Severity: Contributions from Sustained Vowels and Continuous Speech.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Cross-cultural (A)symmetries in Audio-visual Attitude Perception.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Exploiting Speaker and Phonetic Diversity of Mismatched Language Resources for Unsupervised Subword Modeling.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improving Cross-Lingual Knowledge Transferability Using Multilingual TDNN-BLSTM with Language-Dependent Pre-Final Layer.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Reducing Model Complexity for DNN Based Large-Scale Audio Classification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Automatic Speech Assessment for Aphasic Patients Based on Syllable-Level Embedding and Supra-Segmental Duration Features.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features.

[BibT_eX]

[DOI]

Man-Ling Sung

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

Audio-visual expressions of attitude: How many different attitudes can perceivers decode?

[BibT_eX]

[DOI]

Speech Commun., 2017

RNN-LDA Clustering for Feature Based DNN Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic Modeling.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Polyphonic piano note transcription with non-negative matrix factorization of differential spectrogram.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Cross-Language Perception of Audio-visual Attitudinal Expressions.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Auditory-Visual Speech Processing, 2017

2016

Surface Electromyographic Activity of Extrinsic Laryngeal Muscles in Cantonese Tone Production.

[BibT_eX]

[DOI]

Shing Yu

Manwa L. Ng

J. Signal Process. Syst., 2016

The Sheffield language recognition system in NIST LRE 2015.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Towards automatic assessment of aphasia speech using automatic speech recognition techniques.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Exploiting language-mismatched phoneme recognizers for unsupervised acoustic modeling.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Predicting Severity of Voice Disorder from DNN-HMM Acoustic Posteriors.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Hybrid Accelerated Optimization for Speech Recognition.

[BibT_eX]

[DOI]

Jen-Tzung Chien

Pei-Wen Huang

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Automatic speech recognition for acoustical analysis and assessment of cantonese pathological voice and speech.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Supervised Single-Microphone Multi-Talker Speech Separation with Conditional Random Fields.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2015

Acoustic Segment Modeling with Spectral Clustering Methods.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2015

A method of speech periodicity enhancement using transform-domain signal decomposition.

[BibT_eX]

[DOI]

Speech Commun., 2015

Objective measures for quality assessment of noise-suppressed speech.

[BibT_eX]

[DOI]

Speech Commun., 2015

Analysis of intonation patterns in Cantonese aphasia speech.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015

Multi-pitch estimation based on sparse representation with pre-screened dictionary.

[BibT_eX]

[DOI]

Lufei Gao

Proceedings of the 17th IEEE International Workshop on Multimedia Signal Processing, 2015

Modeling temporal dependency for robust estimation of LP model parameters in speech enhancement.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

CUHK System for QUESST Task of MediaEval 2014.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Automatic Key Partition Based on Tonal Organization Information of Classical Music.

[BibT_eX]

[DOI]

Wang-Kong Lam

Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014

Correcting Chord Classification Errors Based on Tonal Organization Information of Classical Music.

[BibT_eX]

[DOI]

Wang-Kong Lam

Proceedings of the 2014 IEEE International Symposium on Multimedia, 2014

Surface electromyographic activity of non-laryngeal neck muscles in Cantonese tone production.

[BibT_eX]

[DOI]

Shing Yu

Manwa L. Ng

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Multipitch tracking based on linear programming relaxation and sparsity-based pitch candidate estimation.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Large-margin conditional random fields for single-microphone speech separation.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A graph-based Gaussian component clustering approach to unsupervised acoustic modeling.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Spoken Language Recognition With Prosodic Features.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Shifted-Delta MLP Features for Spoken Language Recognition.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2013

The CUHK Spoken Web Search System for MediaEval 2013.

[BibT_eX]

[DOI]

Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

Unsupervised mining of acoustic subword units with segment-level Gaussian posteriorgrams.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Using dynamic conditional random field on single-microphone speech separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Evaluation of pitch estimation algorithms on separated speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

A speech enhancement method for cochlear implant listeners.

[BibT_eX]

[DOI]

Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2013

Structured mean field method for single-microphone speech separation with factorial Hidden Markov Model.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Chord classification of multi-instrumental music using exemplar-based sparse representation.

[BibT_eX]

[DOI]

Wang-Kong Lam

Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Improving the sound quality of an electronic voice box.

[BibT_eX]

[DOI]

Manwa L. Ng

Nan Yan

Proceedings of the 6th International Conference on Biomedical Engineering and Informatics, 2013

2012

CUHK System for the Spoken Web Search task at Mediaeval 2012.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Two objective measures for speech distortion and noise reduction evaluation of enhanced speech signals.

[BibT_eX]

[DOI]

Huijun Ding

Ing Yann Soon

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Robust Pitch Estimation Using l1-regularized Maximum Likelihood Estimation.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Integrating multiple observations for model-based single-microphone speech separation with conditional random fields.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

An acoustic segment modeling approach to query-by-example spoken term detection.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Transform-domain Wiener filter for speech periodicity enhancement.

[BibT_eX]

[DOI]

W. Bastiaan Kleijn

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Sparsity-based confidence measure for pitch estimation in noisy speech.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Exploration of Phase and Vocal Excitation Modulation Features for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the Biometric Recognition - 7th Chinese Conference, 2012

Classifying NMF components based on vector similarity for speech and music separation.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011

Robust Speaker Recognition Using Denoised Vocal Source and Vocal Tract Features.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2011

Transform-domain speech periodicity enhancement with adaptive coefficient weighting.

[BibT_eX]

[DOI]

W. Bastiaan Kleijn

Proceedings of the International Symposium on Intelligent Signal Processing and Communications Systems, 2011

Score fusion and calibration in multiple language detectors with large performance variation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Detection target dependent score calibration for language recognition.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Similarity Measures for Chinese Pop Music Based on Low-level Audio Signal Attributes.

[BibT_eX]

[DOI]

Proceedings of the 11th International Society for Music Information Retrieval Conference, 2010

SURE-MSE speech enhancement for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Robust speaker verification using phase information of speech.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Spectral trajectory estimation using nonnegative matrix factorization for model-based monaural speech separation.

[BibT_eX]

[DOI]

Chun-Man Mak

Siu Wa Lee

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Perception and analysis of linearly approximated F0 contours in Cantonese speech.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Semantics-based language modeling for Cantonese-English code-mixing speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Exploitation of phase information for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Towards long-range prosodic attribute modeling for language recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Perception-based automatic approximation of F0 contours in Cantonese speech.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Pitch estimation in noisy speech based on temporal accumulation of spectrum peaks.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Cross-lingual speaker adaptation via Gaussian component mapping.

[BibT_eX]

[DOI]

Houwei Cao

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Prosodic attribute model for spoken language identification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Improved Cantonese Tone Recognition with Approximated F0 Contour: Implications for Cochlear Implants.

[BibT_eX]

[DOI]

Meng Yuan

Haihong Feng

Proceedings of the International Conference on Asian Language Processing, 2010

A method of speech periodicity enhancement based on transform-domain signal decomposition.

[BibT_eX]

[DOI]

Juan Ignacio Godino-Llorente

W. Bastiaan Kleijn

Proceedings of the 18th European Signal Processing Conference, 2010

2009

Analysis and Selection of Prosodic Features for Asian Language Recognition.

[BibT_eX]

[DOI]

Int. J. Asian Lang. Process., 2009

Automatic Recognition of Cantonese-English Code-Mixing Speech.

[BibT_eX]

[DOI]

Int. J. Comput. Linguistics Chin. Lang. Process., 2009

Analysis and Signal Processing of Oesophageal and Pathological Voices.

[BibT_eX]

[DOI]

Pedro Gómez-Vilda

EURASIP J. Adv. Signal Process., 2009

Exploration of vocal excitation modulation features for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Model-based speech separation: identifying transcription using orthogonality.

[BibT_eX]

[DOI]

Siu Wa Lee

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Effects of language mixing for automatic recognition of Cantonese-English code-mixing utterances.

[BibT_eX]

[DOI]

Houwei Cao

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Analysis and Selection of Prosodic Features for Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Conference on Asian Language Processing, 2009

2008

Deriving MFCC Parameters from the Dynamic Spectrum for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Mandarin Tone Perception with Temporal Envelope and Periodicity Cues from Different Frequency Regions.

[BibT_eX]

[DOI]

Meng Yuan

Sigfrid D. Soli

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Entropy-Based Analysis of the Prosodic Features of Chinese Dialects.

[BibT_eX]

[DOI]

Raymond W. M. Ng

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A Perceptual Study of Approximated Cantonese Tone Contours.

[BibT_eX]

[DOI]

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Pitch Tracking for Model-Based Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Prosodic Variation in Cantonese-English Code-Mixed Speech.

[BibT_eX]

[DOI]

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Prosody for Mandarin speech recognition: a comparative study of read and spontaneous speech.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Language modeling for speech recognition of spoken Cantonese.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007

Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation.

[BibT_eX]

[DOI]

Wai Nang Chan

Nengheng Zheng

IEEE Trans. Speech Audio Process., 2007

Integration of Complementary Acoustic Features for Speaker Recognition.

[BibT_eX]

[DOI]

Nengheng Zheng

IEEE Signal Process. Lett., 2007

Integrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification.

[BibT_eX]

[DOI]

Int. J. Comput. Linguistics Chin. Lang. Process., 2007

A power-based adaptive method for eigenanalysis without square-root operations.

[BibT_eX]

[DOI]

Shan Ouyang

Digit. Signal Process., 2007

Quantitative analysis of F0 contours of emotional speech of Mandarin.

[BibT_eX]

[DOI]

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Perceptual equivalence of approximated Cantonese tone contours.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Modeling tones in hakka on the basis of the command-response model.

[BibT_eX]

[DOI]

Rerrario Shui-Ching Ho

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

2006

Speech recognition on DSP: issues on computational efficiency and performance analysis.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2006

Using Duration Information in Cantonese Connected-Digit Recognition.

[BibT_eX]

[DOI]

Yu Zhu

Int. J. Comput. Linguistics Chin. Lang. Process., 2006

Modeling Cantonese Pronunciation Variations for Large-Vocabulary Continuous Speech Recognition.

[BibT_eX]

[DOI]

Patgi Kam

Int. J. Comput. Linguistics Chin. Lang. Process., 2006

Speaker Verification Using Complementary Information from Vocal Source and Vocal Tract.

[BibT_eX]

[DOI]

Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Integrating Complementary Features with a Confidence Measure for Speaker Identification.

[BibT_eX]

[DOI]

Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Towards automatic parameter extraction of command-response model for Cantonese.

[BibT_eX]

[DOI]

Raymond W. M. Ng

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Improved tone modeling for Mandarin broadcast news speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Automatic speech recognition of Cantonese-English code-mixing utterances.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Tone-Enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Feature Extraction From Talking Mouths for Video-Based Bi-Modal Speaker Verification.

[BibT_eX]

[DOI]

Hua Ouyang

Wai Nang Chan

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Use of Vocal Source Features in Speaker Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Development of a Cantonese-English code-mixing speech corpus.

[BibT_eX]

[DOI]

Joyce Y. C. Chan

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR.

[BibT_eX]

[DOI]

Chen Yang

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Analysis and modeling of F0 contours for cantonese text-to-speech.

[BibT_eX]

[DOI]

ACM Trans. Asian Lang. Inf. Process., 2004

On noise robustness of dynamic and static features for continuous Cantonese digit recognition.

[BibT_eX]

[DOI]

Chen Yang

Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Cantonese verbal information verification system using GMM-based anti-model.

[BibT_eX]

[DOI]

Chao Qin

Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Detection of language boundary in code-switching utterances by bi-phone probabilities.

[BibT_eX]

[DOI]

Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Noise-robust automatic speech recognition using mainlobe-resilient time-frequency quantile-based noise estimation.

[BibT_eX]

[DOI]

Siu Wa Lee

Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

Explicit duration modeling for Cantonese connected-digit recognition.

[BibT_eX]

[DOI]

Yu Zhu

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Time -frequency analysis of vocal source signal for speaker recognition.

[BibT_eX]

[DOI]

Nengheng Zheng

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Tone information as a confidence measure for improving Cantonese LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

2003

An HMM-based speech recognition IC.

[BibT_eX]

[DOI]

Proceedings of the 2003 International Symposium on Circuits and Systems, 2003

Overlapped di-tone modeling for tone recognition in continuous Cantonese speech.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Modeling Cantonese pronunciation variation by acoustic model refinement.

[BibT_eX]

[DOI]

Patgi Kam

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002

Using tone information in Cantonese continuous speech recognition.

[BibT_eX]

[DOI]

ACM Trans. Asian Lang. Inf. Process., 2002

Spoken language resources for Cantonese speech processing.

[BibT_eX]

[DOI]

Speech Commun., 2002

A new approach to generating Pitch Cycle Waveform (PCW) for Waveform Interpolation codec.

[BibT_eX]

[DOI]

Ge Gao

Microprocess. Microsystems, 2002

Acoustical F0 analysis of continuous cantonese speech.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Modeling tones in continuous Cantonese speech.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Unsupervised n-best based model adaptation using model-level confidence measures.

[BibT_eX]

[DOI]

Ka-Yan Kwan

Chen Yang

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001

Design, Compilation and Processing of CUCall: A Set of Cantonese Spoken Language Corpora Collected Over Telephone Networks.

[BibT_eX]

[DOI]

Proceedings of the 14th Conference on Computational Linguistics and Speech Processing, 2001

A Low Missing Rate Audio Search Technique for Cantonese Radio Broadcast Recording.

[BibT_eX]

[DOI]

H. S. Lam

Proceedings of the Advances in Multimedia Information Processing, 2001

ISIS: a learning system with combined interaction and delegation dialogs.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Cantonese text-to-speech synthesis using sub-syllable units.

[BibT_eX]

[DOI]

Ka Man Law

Wai H. Lau

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000

Corpus-based Cantonese Speech Synthesis With Non-uniform Units.

[BibT_eX]

[DOI]

Ka Man Law

Ka-Yan Kwan

Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000

A Study on the Contribution of Lexical Tones in Chinese LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000

ISIS: A multilingual spoken dialog system developed with CORBA and KQML agents.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Using cross-syllable units for Cantonese speech synthesis.

[BibT_eX]

[DOI]

Ka Man Law

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Incorporating tone information into Cantonese large-vocabulary continuous speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Lexical tree decoding with a class-based language model for Chinese speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Acoustic modeling for Chinese speech recognition: a comparative study of Mandarin and Cantonese.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2000

1999

Cantonese syllable recognition using neural networks.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 1999

Acoustic modeling and language modeling for cantonese LVCSR.

[BibT_eX]

[DOI]

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Micro-prosodic control in cantonese text-to-speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Two-dimensional multi-resolution analysis of speech signals and its application to speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998

Isolated word recognition using modular recurrent neural networks.

[BibT_eX]

[DOI]

Lai-Wan Chan

Pattern Recognit., 1998

Development of Cantonese Spoken Language Corpora for Speech Application.

[BibT_eX]

[DOI]

Wai Kit Lo

Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing, 1998

Sub-Syllable Acoustic Modelling for Cantonese Speech Recognition.

[BibT_eX]

[DOI]

Ka-Fai Chow

Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing, 1998

Context-dependent duration modelling for continuous speech recognition.

[BibT_eX]

[DOI]

Rolf Carlson

Björn Granström

Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

1997

A neural network based speech recognition system for isolated Cantonese syllables.

[BibT_eX]

[DOI]

Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

Development of a large vocabulary speech database for Cantonese.

[BibT_eX]

[DOI]