Helen M. Meng

According to our database1, Helen M. Meng authored at least 312 papers between 1990 and 2020.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2013, "For contributions to spoken language and multimodal systems".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2020
An Integrated Approach of Machine Learning and Systems Thinking for Waiting Time Prediction in an Emergency Department.
Int. J. Medical Informatics, 2020

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input.
CoRR, 2020

Replay and Synthetic Speech Detection with Res2net Architecture.
CoRR, 2020

Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition.
CoRR, 2020

Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling.
CoRR, 2020

Neural Architecture Search for Speech Recognition.
CoRR, 2020

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
CoRR, 2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
CoRR, 2020

Audio-visual Multi-channel Recognition of Overlapped Speech.
CoRR, 2020

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification.
CoRR, 2020

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech.
CoRR, 2020

Speaker-Aware Linear Discriminant Analysis in Speaker Verification.
Proceedings of the Interspeech 2020, 2020

Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting.
Proceedings of the Interspeech 2020, 2020

Audio-Visual Multi-Channel Recognition of Overlapped Speech.
Proceedings of the Interspeech 2020, 2020

Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks.
Proceedings of the Interspeech 2020, 2020

SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Transferring Source Style in Non-Parallel Voice Conversion.
Proceedings of the Interspeech 2020, 2020

Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition.
Proceedings of the Interspeech 2020, 2020

Enhancing Monotonicity for Robust Autoregressive Transformer TTS.
Proceedings of the Interspeech 2020, 2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Proceedings of the Interspeech 2020, 2020

Investigation of Data Augmentation Techniques for Disordered Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Defense Against Adversarial Attacks on Spoofing Countermeasures of ASV.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Accent Conversion Without Using Native Utterances.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks.
CoRR, 2019

Semi-Supervised Graph Classification: A Hierarchical Graph Perspective.
Proceedings of the World Wide Web Conference, 2019

Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models.
Proceedings of the Interspeech 2019, 2019

Unsupervised Methods for Audio Classification from Lecture Discussion Recordings.
Proceedings of the Interspeech 2019, 2019

One-Shot Voice Conversion with Global Speaker Embeddings.
Proceedings of the Interspeech 2019, 2019

Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition.
Proceedings of the Interspeech 2019, 2019

On the Use of Pitch Features for Disordered Speech Recognition.
Proceedings of the Interspeech 2019, 2019

Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams.
Proceedings of the Interspeech 2019, 2019

Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis.
Proceedings of the Interspeech 2019, 2019

Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition.
Proceedings of the Interspeech 2019, 2019

LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition.
Proceedings of the Interspeech 2019, 2019

The CUHK Dysarthric Speech Recognition Systems for English and Cantonese.
Proceedings of the Interspeech 2019, 2019

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the Interspeech 2019, 2019

Towards Discriminative Representation Learning for Speech Emotion Recognition.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Recurrent Neural Network Language Model Training Using Natural Gradient.
Proceedings of the IEEE International Conference on Acoustics, 2019

Speech Emotion Recognition Using Capsule Networks.
Proceedings of the IEEE International Conference on Acoustics, 2019

Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.
Proceedings of the IEEE International Conference on Acoustics, 2019

Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

CNN-RNN-CTC Based End-to-end Mispronunciation Detection and Diagnosis.
Proceedings of the IEEE International Conference on Acoustics, 2019

Gaussian Process Lstm Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Code-switched TTS with Mix of Monolingual Recordings.
Proceedings of the IEEE International Conference on Acoustics, 2019

Adversarial Attacks on Spoofing Countermeasures of Automatic Speaker Verification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Query-by-Example Spoken Term Detection using Attentive Pooling Networks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Prosodic Structure Prediction using Deep Self-attention Neural Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks.
Speech Commun., 2018

Data Visualization with IBM Watson Analytics for Global Cancer Trends Comparison from World Health Organization.
Int. J. Heal. Inf. Syst. Informatics, 2018

The HCCL-CUHK System for the Voice Conversion Challenge 2018.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

TATC: Predicting Alzheimer's Disease with Actigraphy Data.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

DNN i-vector based Fishervoice and PLDA SVM scoring for NIST SRE 2016.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Speech Super-Resolution Using Parallel WaveNet.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection.
Proceedings of the Interspeech 2018, 2018

Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus.
Proceedings of the Interspeech 2018, 2018

Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method.
Proceedings of the Interspeech 2018, 2018

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the Interspeech 2018, 2018

Speech and Language Processing for Learning and Wellbeing.
Proceedings of the Interspeech 2018, 2018

Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms.
Proceedings of the Interspeech 2018, 2018

Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance.
Proceedings of the Interspeech 2018, 2018

Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the Interspeech 2018, 2018

Gaussian Process Neural Networks for Speech Recognition.
Proceedings of the Interspeech 2018, 2018

Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Feature Based Adaptation for Speaking Style Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Limited-Memory BFGS Optimization of Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Emphatic Speech Generation with Conditioned Input Layer and Bidirectional LSTMS for Expressive Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Social Media as a Tool to Look for People with Dementia Who Become Lost: Factors That Matter.
Proceedings of the 51st Hawaii International Conference on System Sciences, 2018

Drawing-Based Automatic Dementia Screening Using Gaussian Process Markov Chains.
Proceedings of the 51st Hawaii International Conference on System Sciences, 2018

Machine Learning on Drawing Behavior for Dementia Screening.
Proceedings of the 2018 International Conference on Digital Health, 2018

Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices.
Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018

2017
Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Intonation classification for L2 English speech using multi-distribution deep neural networks.
Comput. Speech Lang., 2017

DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances.
Proceedings of the Interspeech 2017, 2017

Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space.
Proceedings of the Interspeech 2017, 2017

Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion.
Proceedings of the Interspeech 2017, 2017

Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer.
Proceedings of the Interspeech 2017, 2017

A model of extended paragraph vector for document categorization and trend analysis.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Data Visualization on Global Trends on Cancer Incidence An Application of IBM Watson Analytics.
Proceedings of the 50th Hawaii International Conference on System Sciences, 2017

Personal Wearable Devices to Measure Heart Rate Variability: A Framework of Cloud Platform for Public Health Research.
Proceedings of the 2017 International Conference on Digital Health, 2017

Classification of Visit-to-Visit Blood Pressure Variability: A Machine Learning Approach for Data Clustering on Systolic Blood Pressure Intervention Trial (SPRINT).
Proceedings of the 2017 International Conference on Digital Health, 2017

Parallel probabilistic swarm guidance by exploiting Kronecker product structures in discrete-time Markov chains.
Proceedings of the 2017 American Control Conference, 2017

Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Capitalizing on musical rhythm for prosodic training in computer-aided language learning.
Comput. Speech Lang., 2016

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition.
CoRR, 2016

Kronecker product approximation with multiple factor matrices via the tensor product algorithm.
Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, 2016

An embedding approach for context-aware collaborative recommendation and visualization.
Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, 2016

Exploratory data analysis on nuclei in cantonese dysarthric speech.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

DBLSTM-based multi-task learning for pitch transformation in voice conversion.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Analysis on Gated Recurrent Unit Based Question Detection Approach.
Proceedings of the Interspeech 2016, 2016

Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams.
Proceedings of the Interspeech 2016, 2016

Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition.
Proceedings of the Interspeech 2016, 2016

Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.
Proceedings of the Interspeech 2016, 2016

Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis.
Proceedings of the Interspeech 2016, 2016

Phonetic posteriorgrams for many-to-one voice conversion without parallel data training.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Recognizing stances in Mandarin social ideological debates with text and acoustic features.
Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016

Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exploring articulatory characteristics of Cantonese dysarthric speech using distinctive features.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Question detection from acoustic features using recurrent neural network with gated recurrent unit.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Learning Track Representation and Trends for Conference Analytics.
Proceedings of the 49th Hawaii International Conference on System Sciences, 2016

Blood Pressure Monitoring on the Cloud System in Elderly Community Centres: A Data Capturing Platform for Application Research in Public Health.
Proceedings of the 7th International Conference on Cloud Computing and Big Data, 2016

Utilizing Real-Time Travel Information, Mobile Applications and Wearable Devices for Smart Public Transportation.
Proceedings of the 7th International Conference on Cloud Computing and Big Data, 2016

2015
Introduction to the Special Section on Continuous Space and Related Methods in Natural Language Processing.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends.
IEEE Signal Process. Mag., 2015

A Survey of Wireless Sensor Network Based Air Pollution Monitoring Systems.
Sensors, 2015

Expressive talking avatar synthesis and animation.
Multim. Tools Appl., 2015

Acoustic to articulatory mapping with deep neural network.
Multim. Tools Appl., 2015

Generating emphatic speech with hidden Markov model for expressive speech synthesis.
Multim. Tools Appl., 2015

Preface.
J. Multimodal User Interfaces, 2015

Integrating acoustic and state-transition models for free phone recognition in L2 English speech using multi-distribution deep neural networks.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Analysis of Dysarthric Speech using Distinctive Feature Recognition.
Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, 2015

Improving automatic forced alignment for dysarthric speech transcription.
Proceedings of the INTERSPEECH 2015, 2015

Development of a Cantonese dysarthric speech corpus.
Proceedings of the INTERSPEECH 2015, 2015

E-commu-book: an assistive technology for users with speech impairments.
Proceedings of the INTERSPEECH 2015, 2015

Using tilt for automatic emphasis detection with Bayesian networks.
Proceedings of the INTERSPEECH 2015, 2015

Topic modeling for conference analytics.
Proceedings of the INTERSPEECH 2015, 2015

Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

AA spectral space warping approach to cross-lingual voice transformation in HMM-based TTS.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A deep recurrent approach for acoustic-to-articulatory inversion.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Blood Pressure Management with Data Capturing in the Cloud among Hypertensive Patients: A Monitoring Platform for Hypertensive Patients.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015

A Data Capturing Platform in the Cloud for Behavioral Analysis among Smokers: An Application Platform for Public Health Research.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015

Embracing Big Data for Simulation Modelling of Emergency Department Processes and Activities.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015

A Real-Time Decision Support Tool for Disaster Response: A Mathematical Programming Approach.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015

Indoor Air Monitoring Platform and Personal Health Reporting System: Big Data Analytics for Public Health Research.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015

A two-pass framework of mispronunciation detection & diagnosis for computer-aided pronunciation training.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Understanding speaking styles of internet speech data with LSTM and low-resource training.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014
Latent Semantic Analysis for Multimodal User Input With Speech and Gestures.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training.
Multim. Tools Appl., 2014

Head and facial gestures synthesis using PAD model for an expressive talking avatar.
Multim. Tools Appl., 2014

Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception.
J. Comput. Sci. Technol., 2014

SeemGo: Conditional Random Fields Labeling and Maximum Entropy Classification for Aspect Based Sentiment Analysis.
Proceedings of the 8th International Workshop on Semantic Evaluation, 2014

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification.
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Automatic speech data clustering with human perception based weighted distance.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Mispronunciation detection and diagnosis in l2 english speech using multi-distribution Deep Neural Networks.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

PLDA modeling in the fishervoice subspace for speaker verification.
Proceedings of the INTERSPEECH 2014, 2014

Using conditional random fields to predict focus word pair in spontaneous spoken English.
Proceedings of the INTERSPEECH 2014, 2014

Statistical parametric speech synthesis using weighted multi-distribution deep belief network.
Proceedings of the INTERSPEECH 2014, 2014

Contrastive auto-encoder for phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Learning dynamic features with neural networks for phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Phonological modeling of mispronunciation gradations in L2 English speech of L1 Chinese learners.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition.
CoRR, 2013

Predicting gradation of L2 English mispronunciations using crowdsourced ratings and phonological rules.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013

Lexical stress detection for L2 English speech using deep belief networks.
Proceedings of the INTERSPEECH 2013, 2013

Investigation of tandem deep belief network approach for phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Audiovisual synthesis of exaggerated speech for corrective feedback in computer-assisted pronunciation training.
Proceedings of the IEEE International Conference on Acoustics, 2013

Clustering similar acoustic classes in the Fishervoice framework.
Proceedings of the IEEE International Conference on Acoustics, 2013

Multi-distribution deep belief network for speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2013

Predicting gradation of L2 English mispronunciations using ASR with extended recognition network.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012
Farewell Editorial.
IEEE Trans. Speech Audio Process., 2012

Phoneme-level articulatory animation in pronunciation training.
Speech Commun., 2012

Predicting User Satisfaction in Spoken Dialog System Evaluation With Collaborative Filtering.
IEEE J. Sel. Top. Signal Process., 2012

Welcome message from the conference chair.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

mENUNCIATE: Development of a computer-aided pronunciation training system on a cross-platform framework for mobile, speech-enabled application development.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Perceptually-motivated assessment of automatically detected lexical stress in L2 learners' speech.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Analysis on mispronunciations in CAPT based on computational speech perception.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training.
Proceedings of the INTERSPEECH 2012, 2012

Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data.
Proceedings of the INTERSPEECH 2012, 2012

Modeling the correlation between modality semantics and facial expressions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
On Mispronunciation Lexicon Generation Using Joint-Sequence Multigrams in Computer-Aided Pronunciation Training (CAPT).
Proceedings of the INTERSPEECH 2011, 2011

Prominence Model for Prosodic Features in Automatic Lexical Stress and Pitch Accent Detection.
Proceedings of the INTERSPEECH 2011, 2011

An Analysis Framework Based on Random Subspace Sampling for Speaker Verification.
Proceedings of the INTERSPEECH 2011, 2011

Design and Collection of an L2 English Corpus with a Suprasegmental Focus for Chinese Learners of English.
Proceedings of the 17th International Congress of Phonetic Sciences, 2011

Allophonic variations in visual speech synthesis for corrective feedback in CAPT.
Proceedings of the IEEE International Conference on Acoustics, 2011

The HKCUPU system for the NIST 2010 speaker recognition evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Pseudo-Conventional N-Gram Representation of the Discriminative N-Gram Model for LVCSR.
IEEE J. Sel. Top. Signal Process., 2010

Introduction to the Issue on Statistical Learning Methods for Speech and Language Processing.
IEEE J. Sel. Top. Signal Process., 2010

Using finite state machines for evaluating spoken dialog systems.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Collaborative filtering model for user satisfaction prediction in Spoken Dialog System evaluation.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Collection of user judgments on spoken dialog system with crowdsourcing.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Predicting user evaluations of spoken dialog systems using semi-supervised learning.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions.
Proceedings of the 15th International Conference on Intelligent User Interfaces, 2010

Modeling prosody patterns for Chinese expressive text-to-speech synthesis.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Development of an articulatory visual-speech synthesizer to support language learning.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Capturing L2 segmental mispronunciations with joint-sequence models in Computer-Aided Pronunciation Training (CAPT).
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Detection of intonation in L2 English speech of native Mandarin learners.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

An enhanced Fishervoice subspace framework for text-independent speaker verification.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT).
Proceedings of the INTERSPEECH 2010, 2010

Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system.
Proceedings of the INTERSPEECH 2010, 2010

Statistical phone duration modeling to filter for intact utterances in a computer-assisted pronunciation training system.
Proceedings of the IEEE International Conference on Acoustics, 2010

Fishervioce: A discriminant subspace framework for speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar.
Proceedings of the Modeling Machine Emotions for Realizing Intelligence, 2010

2009
Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System.
IEEE Trans. Speech Audio Process., 2009

Cross-Modality Semantic Integration With Hypothesis Rescoring for Robust Interpretation of Multimodal User Interactions.
IEEE Trans. Speech Audio Process., 2009

Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2009

Developing Speech Recognition and Synthesis Technologies to Support Computer-Aided Pronunciation Training for Chinese Learners of English.
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009

Studying L2 suprasegmental features in asian Englishes: a position paper.
Proceedings of the INTERSPEECH 2009, 2009

Audiovisual Tools for Phonetic and Articulatory Visualization in Computer-Aided Pronunciation Training.
Proceedings of the Development of Multimodal Interfaces: Active Listening and Synchrony, 2009

Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features.
Proceedings of the ACL 2009, 2009

2008
The Use of Dynamic Deformable Templates for Lip Tracking in an Audio-Visual Corpus with Large Variations in Head Pose, Face Illumination and Lip Shapes.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Decision Fusion for Improving Mispronunciation Detection Using Language Transfer Knowledge and Phoneme-Dependent Pronunciation Scoring.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A New Prosodic Strength Calculation Method for Prosody Reduction Modeling.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training.
Proceedings of the INTERSPEECH 2008, 2008

Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer.
Proceedings of the INTERSPEECH 2008, 2008

Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Speaker Verification via High-Level Feature Based Phonetic-Class Pronunciation Modeling.
IEEE Trans. Computers, 2007

Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

High-level feature-based speaker verification via articulatory phonetic-class pronunciation modeling.
Proceedings of the INTERSPEECH 2007, 2007

Complementarity and redundancy in multimodal user inputs with speech and pen gestures.
Proceedings of the INTERSPEECH 2007, 2007

Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation.
Proceedings of the INTERSPEECH 2007, 2007

Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar.
Proceedings of the IEEE International Conference on Acoustics, 2007

Effects of Device Mismatch, Language Mismatch and Environmental Mismatch on Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2007

Adaptive Weight Estimation in Multi-Biometric Verification using Fuzzy Logic Decision Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2007

Discriminant Mutual Subspace Learning for Indoor and Outdoor Face Recognition.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

Deriving salient learners' mispronunciations from cross-language phonological comparisons.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar.
Proceedings of the Affective Computing and Intelligent Interaction, 2007

2006
Modelling the Global acoustic Correlates of Expressivity for Chinese Text-to-speech Synthesis.
Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

A Maximum Entropy Framework that Integrates Word Dependencies and Grammatical Relations for Reading Comprehension.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

A Corpus-Based Approach for Cooperative Response Generation in a Dialog System.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Initial Experiments on Automatic Story Segmentation in Chinese Spoken Documents Using Lexical Cohesion of Extracted Named Entities.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

A multi-pass error detection and correction framework for Mandarin LVCSR.
Proceedings of the INTERSPEECH 2006, 2006

Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis.
Proceedings of the INTERSPEECH 2006, 2006

Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar.
Proceedings of the INTERSPEECH 2006, 2006

Joint interpretation of input speech and pen gestures for multimodal human-computer interaction.
Proceedings of the INTERSPEECH 2006, 2006

Multi-level Fusion of Audio and Visual Features for Speaker Identification.
Proceedings of the Advances in Biometrics, International Conference, 2006

A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
The Use of Metadata, Web-derived Answer Patterns and Passage Context to Improve Reading Comprehension Performance.
Proceedings of the HLT/EMNLP 2005, 2005

Embedded Cantonese TTS for multi-device access to web content.
Proceedings of the INTERSPEECH 2005, 2005

2004
ISIS: an adaptive, trilingual conversational system with interleaving interaction and delegation dialogs.
ACM Trans. Comput. Hum. Interact., 2004

Multi-Scale Spoken Document Retrieval for Cantonese Broadcast News.
Int. J. Speech Technol., 2004

Mandarin-English Information (MEI): investigating translingual speech retrieval.
Comput. Speech Lang., 2004

Error identification for large vocabulary speech recognition.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Bilingual response generation using semi-automatically-induced templates for a mixed-initiative dialog system.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Prosody and style controls in CU VOCAL using SSML and SAPI XML tags.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Detection of language boundary in code-switching utterances by bi-phone probabilities.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

A two-level schema for detecting recognition errors.
Proceedings of the INTERSPEECH 2004, 2004

Fuzzy logic decision fusion in a multimodal biometric system.
Proceedings of the INTERSPEECH 2004, 2004

A Pruning Approach for GMM-Based Speaker Verification in Mobile Embedded Systems.
Proceedings of the Biometric Authentication, First International Conference, 2004

A real-time Cantonese text-to-audiovisual speech synthesizer.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Bilingual Chinese/English voice browsing based on a VoiceXML platform.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

English-Chinese bilingual text-independent speaker verification.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Using Verb Dependency Matching in a Reading Comprehension System.
Proceedings of the Information Retrieval Technology, Asia Information Retrieval Symposium, 2004

2003
The use of belief networks for mixed-initiative dialog modeling.
IEEE Trans. Speech Audio Process., 2003

Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion.
ACM Trans. Asian Lang. Inf. Process., 2003

CU VOCAL Web Service: A Text-to-speech Synthesis Web Service for Voice-enabled Web-mediated Applications.
Proceedings of the Twelfth International World Wide Web Conference - Posters, 2003

Example-based bi-directional Chinese-English machine translation with semi-automatically induced grammars.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Natural language response generation in mixed-initiative dialogs using task goals and dialog acts.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Recent enhancements in CU VOCAL for Chinese TTS-enabled applications.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries.
IEEE Trans. Knowl. Data Eng., 2002

A system for spoken query information retrieval on mobile devices.
IEEE Trans. Speech Audio Process., 2002

GLR parsing with multiple grammars for natural language queries.
ACM Trans. Asian Lang. Inf. Process., 2002

Spoken language resources for Cantonese speech processing.
Speech Commun., 2002

Intelligent speech for information systems: towards biliteracy and trilingualism.
Interact. Comput., 2002

CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

ISIS: a multi-modal, trilingual, distributed spoken dialog system developed with CORBA, java, XML and KQML.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Multi-scale and multi-model integration for improved performance in Chinese spoken document retrieval.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001
A hierarchical lexical representation for bi-directional spelling-to-pronunciation/pronunciation-to-spelling generation.
Speech Commun., 2001

Using contextual analysis for news event detection.
Int. J. Intell. Syst., 2001

Design, Compilation and Processing of CUCall: A Set of Cantonese Spoken Language Corpora Collected Over Telephone Networks.
Proceedings of the 14th Conference on Computational Linguistics and Speech Processing, 2001

Learning Strategies In A Grammar Induction Framework.
Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, 2001

Scalability and Portability of a Belief Network-based Dialog Model for Different Application Domains.
Proceedings of the First International Conference on Human Language Technology Research, 2001

Mandarin-English Information: Investigating Translingual Speech Retrieval.
Proceedings of the First International Conference on Human Language Technology Research, 2001

Automatic event generation from multi-lingual news stories.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2001

Automatic Grammar Partitioning for Syntactic Parsing.
Proceedings of the Seventh International Workshop on Parsing Technologies (IWPT-2001), 2001

Multi-parser architecture for query processing.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Semi-automatic grammar induction for bi-directional English-Chinese machine translation.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

ISIS: a learning system with combined interaction and delegation dialogs.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Multi-scale retrieval in MEI: an English-Chinese translingual speech retrieval system.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Multi-scale-audio indexing for translingual spoken document retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2001

A dynamic semantic model for re-scoring recognition hypotheses.
Proceedings of the IEEE International Conference on Acoustics, 2001

Speech retrieval with video parsing for television news programs.
Proceedings of the IEEE International Conference on Acoustics, 2001

Automatic Story Segmentation for Spoken Document Retrieval.
Proceedings of the 10th IEEE International Conference on Fuzzy Systems, 2001

2000
Initial Development Towards a Trilingual Speech Interface for Financial Information Inquiries.
Int. J. Speech Technol., 2000

Parsing a Lattice with Multiple Grammars.
Proceedings of the Sixth Internatonal Workshop on Parsing Technologies, 2000

Query expansion using phonetic confusions for Chinese spoken document retrieval.
Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages, 2000, Hong Kong, China, September 30, 2000

Multi-scale audio indexing for Chinese spoken document retrieval.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

ISIS: A multilingual spoken dialog system developed with CORBA and KQML agents.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Grammar partitioning and parser composition for natural language understanding.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

CU FOREX: a bilingual spoken dialog system for foreign exchange enquiries.
Proceedings of the IEEE International Conference on Acoustics, 2000

Concatenating syllables for response generation in spoken language applications.
Proceedings of the IEEE International Conference on Acoustics, 2000

1999
An Analytical Study of Transformational Tagging for Chinese Text.
Proceedings of the 12th Research on Computational Linguistics Conference, 1999

Semi-automatic acquisition of domain-specific semantic structures.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

To believe is to understand.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Micro-prosodic control in cantonese text-to-speech synthesis.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1997
From interface to content: translingual access and delivery of on-line information.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

YINHE: a Mandarin Chinese version of the GALAXY system.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

1996
Reversible letter-to-sound/sound-to-letter generation based on parsing word morpology.
Speech Commun., 1996

Multilingual human-computer interactions: from information access to language learning.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

ANGIE: a new framework for speech analysis based on morpho-phonological modelling.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

WHEELS: a conversational system in the automobile classifieds domain.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

A form-based dialogue manager for spoken language applications.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

1995
Phonological parsing for bi-directional letter-to-sound/sound-to-letter generation.
PhD thesis, 1995

1994
Phonological Parsing for Bi-directional Letter-to-Sound/Sound-to-Letter Generation.
Proceedings of the Human Language Technology, 1994

Phonological parsing for reversible letter-to-sound/sound-to-letter generation.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

1993
Reversible letter-to-sound sound-to-letter generation based on parsing word morphology.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

1992
Language modelling for recognition and understanding using layered bigrams.
Proceedings of the Second International Conference on Spoken Language Processing, 1992

1991
Signal Representation Attribute Extraction and the Use Distinctive Features for Phonetic Classification.
Proceedings of the Speech and Natural Language, 1991

Signal representation comparison for phonetic classification.
Proceedings of the 1991 International Conference on Acoustics, 1991

1990
A comparative study of acoustic representations of speech for vowel classification using multi-layer perceptrons.
Proceedings of the First International Conference on Spoken Language Processing, 1990


  Loading...