Hsin-Min Wang

Orcid: 0000-0003-3599-5071

Affiliations:
  • Academia Sinica, Taipei, Taiwan
  • National Taiwan University, Taipei, Taiwan (PhD 1995)


According to our database1, Hsin-Min Wang authored at least 338 papers between 1993 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data.
CoRR, 2024

HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids.
CoRR, 2024

2023
Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Decomposition and Reorganization of Phonetic Information for Speaker Embedding Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Multi-Target Extractor and Detector for Unknown-Number Speaker Diarization.
IEEE Signal Process. Lett., 2023

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model.
CoRR, 2023

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection.
CoRR, 2023

AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection.
CoRR, 2023

A Study on Incorporating Whisper for Robust Speech Assessment.
CoRR, 2023

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement.
CoRR, 2023

Utilizing Whisper to Enhance Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.
CoRR, 2023

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model.
CoRR, 2023

Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features.
CoRR, 2023

Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion.
CoRR, 2023

BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm.
CoRR, 2023

D4AM: A General Denoising Framework for Downstream Acoustic Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Improved Lite Audio-Visual Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

SVSNet: An End-to-End Speaker Voice Similarity Assessment Model.
IEEE Signal Process. Lett., 2022

CasNet: Investigating Channel Robustness for Speech Separation.
CoRR, 2022

A Teacher-student Framework for Unsupervised Speech Enhancement Using Noise Remixing Training and Two-stage Inference.
CoRR, 2022

Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN.
CoRR, 2022

A Study of Using Cepstrogram for Countermeasure Against Replay Attacks.
CoRR, 2022

Filter-based Discriminative Autoencoders for Children Speech Recognition.
CoRR, 2022

Multi-Target Filter and Detector for Speaker Diarization.
CoRR, 2022

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition.
CoRR, 2022

Is Character Trigram Overlapping Ratio Still the Best Similarity Measure for Aligning Sentences in a Paraphrased Corpus?
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing, 2022

Chinese Movie Dialogue Question Answering Dataset.
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing, 2022

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model.
Proceedings of the Interspeech 2022, 2022

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.
Proceedings of the Interspeech 2022, 2022

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks.
Proceedings of the Interspeech 2022, 2022

NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling.
Proceedings of the Interspeech 2022, 2022

Chain-based Discriminative Autoencoders for Speech Recognition.
Proceedings of the Interspeech 2022, 2022

The VoiceMOS Challenge 2022.
Proceedings of the Interspeech 2022, 2022

Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.
Proceedings of the IEEE International Conference on Acoustics, 2022

EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Learning to Visualize Music Through Shot Sequence for Automatic Concert Video Mashup.
IEEE Trans. Multim., 2021

Speech Enhancement-assisted Stargan Voice Conversion in Noisy Environments.
CoRR, 2021

The AS-NU System for the M2VoC Challenge.
CoRR, 2021

Mining Commonsense and Domain Knowledge from Math Word Problems.
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing, 2021

A Flexible and Extensible Framework for Multiple Answer Modes Question Answering.
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing, 2021

Investigation of a Single-Channel Frequency-Domain Speech Enhancement Network to Improve End-to-End Bengali Automatic Speech Recognition Under Unseen Noisy Conditions.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

SurpriseNet: Melody Harmonization Conditioning on User-controlled Surprise Contours.
Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021

MoEVC: A Mixture of Experts Voice Conversion System With Sparse Gating Mechanism for Online Computation Acceleration.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Relational Data Selection for Data Augmentation of Speaker-Dependent Multi-Band MelGAN Vocoder.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Dual-Path Filter Network: Speaker-Aware Modeling for Speech Separation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

AlloST: Low-Resource Speech Translation Without Source Transcription.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Melody Harmonization Using Orderless Nade, Chord Balancing, and Blocked Gibbs Sampling.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Recognition by Simply Fine-Tuning Bert.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Enhancement with Zero-Shot Model Selection.
Proceedings of the 29th European Signal Processing Conference, 2021

Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Generation of Speaker Representations Using Heterogeneous Training Batch Assembly.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Improvement of Spatial Ambiguity in Multi-Channel Speech Separation Using Channel Attention.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Sequence to General Tree: Knowledge-Guided Geometry Word Problem Solving.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion.
IEEE Trans. Emerg. Top. Comput. Intell., 2020

Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement.
IEEE Signal Process. Lett., 2020

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech.
Comput. Speech Lang., 2020

The Academia Sinica Systems of Voice Conversion for VCC2020.
CoRR, 2020

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders.
CoRR, 2020

Using Taigi Dramas with Mandarin Chinese Subtitles to Improve Taigi Speech Recognition.
Proceedings of the 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2020

SERIL: Noise Adaptive Speech Enhancement Using Regularization-Based Incremental Learning.
Proceedings of the Interspeech 2020, 2020

Lite Audio-Visual Speech Enhancement.
Proceedings of the Interspeech 2020, 2020

Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Combining Deep Embeddings of Acoustic and Articulatory Features for Speaker Identification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Statistics Pooling Time Delay Neural Network Based on X-Vector for Speaker Verification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Joint Training of Guided Learning and Mean Teacher Models for Sound Event Detection.
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
MoEVC: A Mixture-of-experts Voice Conversion System with Sparse Gating Mechanism for Accelerating Online Computation.
CoRR, 2019

Distributed Microphone Speech Enhancement based on Deep Learning.
CoRR, 2019

The ASVspoof 2019 database.
CoRR, 2019

Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement.
CoRR, 2019

Multichannel Speech Enhancement by Raw Waveform-mapping using Fully Convolutional Networks.
CoRR, 2019

Influences of Prosodic Feature Replacement on the Perceived Singing Voice Identity.
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing, 2019

Oriental COCOSDA - country report 2019 language resources developed in Taiwan.
Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019

Bone-Conducted Speech Enhancement Using Hierarchical Extreme Learning Machine.
Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019

Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric.
Proceedings of the Interspeech 2019, 2019

MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion.
Proceedings of the Interspeech 2019, 2019

Noise Adaptive Speech Enhancement Using Domain Adversarial Training.
Proceedings of the Interspeech 2019, 2019

Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion.
Proceedings of the Interspeech 2019, 2019

Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR.
Proceedings of the Interspeech 2019, 2019

Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Audio-Visual Speech Enhancement using Hierarchical Extreme Learning Machine.
Proceedings of the 27th European Signal Processing Conference, 2019

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion.
Proceedings of the 27th European Signal Processing Conference, 2019

Spoken Multiple-Choice Question Answering Using Multimodal Convolutional Neural Networks.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Investigation of Neural Network Approaches for Unified Spectral and Prosodic Feature Enhancement.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Multi-task Learning for Acoustic Modeling Using Articulatory Attributes.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Compressed Multimodal Hierarchical Extreme Learning Machine for Speech Enhancement.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Improving Automatic Jazz Melody Generation by Transfer Learning Techniques.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Sequential Speaker Embedding and Transfer Learning for Text-Independent Speaker Identification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Coherent Deep-Net Fusion To Classify Shots In Concert Videos.
IEEE Trans. Multim., 2018

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks.
IEEE Trans. Emerg. Top. Comput. Intell., 2018

An Information Distillation Framework for Extractive Summarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Locally Linear Embedding Based Post-Filtering for Speech Enhancement.
J. Inf. Sci. Eng., 2018

Voice Conversion Based on Locally Linear Embedding.
J. Inf. Sci. Eng., 2018

WaveNet 聲碼器及其於語音轉換之應用 (WaveNet Vocoder and its Applications in Voice Conversion) [In Chinese].
Proceedings of the 30th Conference on Computational Linguistics and Speech Processing, 2018

Automatic Detection of Speech Under Cold Using Discriminative Autoencoders and Strength Modeling with Multiple Sub-Dictionary Generation.
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Exemplar-Based Spectral Detail Compensation for Voice Conversion.
Proceedings of the Interspeech 2018, 2018

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM.
Proceedings of the Interspeech 2018, 2018

Seethevoice: Learning from Music to Visual Storytelling of Shots.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Essence Vector-Based Query Modeling for Spoken Document Retrieval.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Denoising Autoencoder Based Post Filtering for Speech Enhancement.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
Affective Music Information Retrieval.
Proceedings of the Emotions and Personality in Personalized Services, 2017

A Position-Aware Language Modeling Framework for Extractive Broadcast News Speech Summarization.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2017

A Replay Spoofing Detection System Based on Discriminative Autoencoders.
Int. J. Comput. Linguistics Chin. Lang. Process., 2017

On the Use of Neural Network Modeling Techniques for Spoken Document Retrieval.
Int. J. Comput. Linguistics Chin. Lang. Process., 2017

An Empirical Comparison of Contemporary Unsupervised Approaches for Extractive Speech Summarization.
Int. J. Comput. Linguistics Chin. Lang. Process., 2017

Audio-Visual Speech Enhancement based on Multimodal Deep Convolutional Neural Network.
CoRR, 2017

基於鑑別式自編碼解碼器之錄音回放攻擊偵測系統 (A Replay Spoofing Detection System Based on Discriminative Autoencoders) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017

使用查詢意向探索與類神經網路於語音文件檢索之研究 (Exploring Query Intent and Neural Network modeling Techniques for Spoken Document Retrieval) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017

基於i-vector與PLDA並使用GMM-HMM強制對位之自動語者分段標記系統 (Speaker Diarization based on I-vector PLDA Scoring and using GMM-HMM Forced Alignment) [In Chinese].
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing, 2017

Automatic Music Video Generation Based on Simultaneous Soundtrack Recommendation and Video Editing.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Discriminative Autoencoders for Acoustic Modeling.
Proceedings of the Interspeech 2017, 2017

A Post-Filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement.
Proceedings of the Interspeech 2017, 2017

Wavelet Speech Enhancement Based on Robust Principal Component Analysis.
Proceedings of the Interspeech 2017, 2017

Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks.
Proceedings of the Interspeech 2017, 2017

Exploring the Use of Significant Words Language Modeling for Spoken Document Retrieval.
Proceedings of the Interspeech 2017, 2017

A locally linear embbeding based postfiltering approach for speech enhancement.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep-net fusion to classify shots in concert videos.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Speech emotion recognition with skew-robust neural networks.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Leveraging manifold learning for extractive broadcast news summarization.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Discriminative autoencoders for speaker verification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A locality-preserving essence vector modeling framework for spoken document retrieval.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Neural relevance-aware query modeling for spoken document retrieval.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Fast locally linear embedding algorithm for exemplar-based voice conversion.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Alignment of Lyrics With Accompanied Singing Audio Based on Acoustic-Phonetic Vowel Likelihood Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Exploring the use of unsupervised query modeling techniques for speech recognition and summarization.
Speech Commun., 2016

運用序列到序列生成架構於重寫式自動摘要(Exploiting Sequence-to-Sequence Generation Framework for Automatic Abstractive Summarization)[In Chinese].
Proceedings of the 28th Conference on Computational Linguistics and Speech Processing, 2016

Automatic Music Video Generation Based on Emotion-Oriented Pseudo Song Prediction and Matching.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Dictionary update for NMF-based voice conversion using an encoder-decoder network.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Locally Linear Embedding for Exemplar-Based Spectral Conversion.
Proceedings of the Interspeech 2016, 2016

Exploring Word Mover's Distance and Semantic-Aware Embedding Techniques for Extractive Broadcast News Summarization.
Proceedings of the Interspeech 2016, 2016

Minimization of Regression and Ranking Losses with Shallow Neural Networks on Automatic Sincerity Evaluation.
Proceedings of the Interspeech 2016, 2016

DEMV-matchmaker: Emotional temporal course representation and deep similarity matching for automatic music video generation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Improved spoken document summarization with coverage modeling techniques.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Learning to Distill: The Essence Vector Modeling Framework.
Proceedings of the COLING 2016, 2016

Exploiting graph regularized nonnegative matrix factorization for extractive speech summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Voice conversion from non-parallel corpora using variational auto-encoder.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Audio-visual speech enhancement using deep neural networks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

A novel paragraph embedding method for spoken document summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Combining Relevance Language Modeling and Clarity Measure for Extractive Speech Summarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

An Acoustic-Phonetic Model of F0 Likelihood for Vocal Melody Extraction.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Extractive Broadcast News Summarization Leveraging Recurrent Neural Network Language Modeling Techniques.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

A Probabilistic Framework for Chinese Spelling Check.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2015

Modeling the Affective Content of Music with a Gaussian Mixture Model.
IEEE Trans. Affect. Comput., 2015

Extractive Spoken Document Summarization with Representation Learning Techniques.
Int. J. Comput. Linguistics Chin. Lang. Process., 2015

Investigating Modulation Spectrum Factorization Techniques for Robust Speech Recognition.
Int. J. Comput. Linguistics Chin. Lang. Process., 2015

Affective Music Information Retrieval.
CoRR, 2015

Mandarin Singing Voice Synthesis Based on Harmonic Plus Noise Model and Singing Expression Analysis.
CoRR, 2015

表示法學習技術於節錄式語音文件摘要之研究(A Study on Representation Learning Techniques for Extractive Spoken Document Summarization) [In Chinese].
Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, 2015

調變頻譜分解之改良於強健性語音辨識(Several Refinements of Modulation Spectrum Factorization for Robust Speech Recognition) [In Chinese].
Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, 2015

EMV-matchmaker: Emotional Temporal Course Modeling and Matching for Automatic Music Video Generation.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Positional language modeling for extractive broadcast news speech summarization.
Proceedings of the INTERSPEECH 2015, 2015

Leveraging word embeddings for spoken document summarization.
Proceedings of the INTERSPEECH 2015, 2015

A histogram density modeling approach to music emotion recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

I-vector based language modeling for query representation.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Incorporating paragraph embeddings and density peaks clustering for spoken document summarization.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Incorporating proximity information in relevance language modeling for extractive speech summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

A probabilistic interpretation for artificial neural network-based voice conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014
Generalized k-Labelsets Ensemble for Multi-Label and Cost-Sensitive Classification.
IEEE Trans. Knowl. Data Eng., 2014

Enhancing Query Formulation for Spoken Document Retrieval.
J. Inf. Sci. Eng., 2014

探究新穎語句模型化技術於節錄式語音摘要 (Investigating Novel Sentence Modeling Techniques for Extractive Speech Summarization) [In Chinese].
Proceedings of the 26th Conference on Computational Linguistics and Speech Processing, 2014

Automatic Set List Identification and Song Segmentation for Full-Length Concert Videos.
Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014

Enhanced language modeling for extractive speech summarization with sentence relatedness information.
Proceedings of the INTERSPEECH 2014, 2014

Clustering-based i-vector formulation for speaker recognition.
Proceedings of the INTERSPEECH 2014, 2014

Ensemble of machine learning algorithms for cognitive and physical speaker load detection.
Proceedings of the INTERSPEECH 2014, 2014

Towards time-varying music auto-tagging based on CAL500 expansion.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

A recurrent neural network language modeling framework for extractive speech summarization.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Improving music auto-tagging by intra-song instance bagging.
Proceedings of the IEEE International Conference on Acoustics, 2014

Effective pseudo-relevance feedback for language modeling in extractive speech summarization.
Proceedings of the IEEE International Conference on Acoustics, 2014

Speaker verification using kernel-based binary classifiers with binary operation derived features.
Proceedings of the IEEE International Conference on Acoustics, 2014

I-vector based language modeling for spoken document retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2014

Leveraging Effective Query Modeling Techniques for Speech Recognition and Summarization.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

A margin-based discriminative modeling approach for extractive speech summarization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Emotion recognition of conversational affective speech using temporal course modeling-based error weighted cross-correlation model.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
改良語句模型技術於節錄式語音摘要之研究 (Improved Sentence Modeling Techniques for Extractive Speech Summarization) [In Chinese].
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing, 2013

Query-Document Relevance Topic Models.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2013

Non-reference audio quality assessment for online live music recordings.
Proceedings of the ACM Multimedia Conference, 2013

Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training.
Proceedings of the INTERSPEECH 2013, 2013

Semantic Naïve Bayes Classifier for Document Classification.
Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013

Subspace-based phonotactic language recognition using multivariate dynamic linear models.
Proceedings of the IEEE International Conference on Acoustics, 2013

Weighted matrix factorization for spoken document retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2013

Effective pseudo-relevance feedback for spoken document retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2013

Incorporating global variance in the training phase of GMM-based voice conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

A Study of Language Modeling for Chinese Spelling Check.
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, 2013

2012
Spoken Document Retrieval Leveraging Unsupervised and Supervised Topic Modeling Techniques.
IEICE Trans. Inf. Syst., 2012

A Term Association Translation Model for Naive Bayes Text Classification.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2012

The acoustic emotion gaussians model for emotion-based music annotation and retrieval.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

The acousticvisual emotion guassians model for automatic generation of music video.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Exploring the relationship between categorical and dimensional emotion semantics of music.
Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies, 2012

Exploring mutual information for GMM-based spectral conversion.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Subspace-Based Feature Representation and Learning for Language Recognition.
Proceedings of the INTERSPEECH 2012, 2012

A Study of Mutual Information for GMM-Based Spectral Conversion.
Proceedings of the INTERSPEECH 2012, 2012

Word Relevance Modeling for Speech Recognition.
Proceedings of the INTERSPEECH 2012, 2012

Term relevance dependency model for text classification.
Proceedings of the 21st International Conference on Pattern Recognition, 2012

Playing with tagging: A real-time tagging music player.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Generalized k-labelset ensemble for multi-label classification.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Personalized music emotion recognition via model adaptation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval.
IEEE Trans. Multim., 2011

Audio Tag Annotation and Retrieval Using Tag Count Information.
Proceedings of the Advances in Multimedia Modeling, 2011

Colorizing tags in tag cloud: a novel query-by-tag music search system.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Learning the Similarity of Audio Music in Bag-of-frames Representation from Tagged Music Data.
Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011

An Acoustic-Phonetic Approach to Vocal Melody Extraction.
Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011

Query by multi-tags with multi-level preferences for content-based music retrieval.
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

Automatic annotation of Web videos.
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

Cost-sensitive stacking for audio tag annotation and retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Fast min-hashing indexing and robust spatio-temporal matching for detecting video copies.
ACM Trans. Multim. Comput. Commun. Appl., 2010

Time-Series Linear Search for Video Copies Based on Compact Signature Manipulation and Containment Relation Modeling.
IEEE Trans. Circuits Syst. Video Technol., 2010

BIC-Based Speaker Segmentation Using Divide-and-Conquer Strategies With Application to Speaker Diarization.
IEEE Trans. Speech Audio Process., 2010

Exploiting semantic associative information in topic modeling.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Phone boundary refinement using ranking methods.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Speaker verification using support vector machine with LLR-based sequence kernels.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Bayesian speaker recognition using Gaussian mixture model and laplace approximation.
Proceedings of the INTERSPEECH 2010, 2010

Phonetic subspace mixture model for speaker diarization.
Proceedings of the INTERSPEECH 2010, 2010

A Discriminative and Heteroscedastic Linear Feature Transformation for Multiclass Classification.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Homogeneous segmentation and classifier ensemble for audio tag annotation and retrieval.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

Detecting pitching frames in baseball game video using Markov random walk.
Proceedings of the International Conference on Image Processing, 2010

Background music identification through content filtering and min-hash matching.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Model-Based Clustering by Probabilistic Self-Organizing Maps.
IEEE Trans. Neural Networks, 2009

A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization.
IEEE Trans. Speech Audio Process., 2009

A Comparative Study of Probabilistic Ranking Models for Chinese Spoken Document Summarization.
ACM Trans. Asian Lang. Inf. Process., 2009

Improving the characterization of the alternative hypothesis via minimum verification error training with applications to speaker verification.
Pattern Recognit., 2009

Raman-Based 10.66 Gb/s Bidirectional TDM over Long-Reach WDM Hybrid PON.
IEICE Trans. Commun., 2009

Evolutionary minimization of the Rand index for speaker clustering.
Comput. Speech Lang., 2009

Improving GMM-UBM speaker verification using discriminative feedback adaptation.
Comput. Speech Lang., 2009

Virtual Chinese tutor (VCT) - a Chinese language pronunciation learning software.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2009

Speaker diarization using divide-and-conquer.
Proceedings of the INTERSPEECH 2009, 2009

Articulatory feature asynchrony analysis and compensation in detection-based ASR.
Proceedings of the INTERSPEECH 2009, 2009

Learning to rank from Bayesian decision inference.
Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009

2008
A Query-by-Singing System for Retrieving Karaoke Music.
IEEE Trans. Multim., 2008

Using Kernel Discriminant Analysis to Improve the Characterization of the Alternative Hypothesis for Speaker Verification.
IEEE Trans. Speech Audio Process., 2008

Using the Similarity of Main Melodies to Identify Cover Versions of Popular Songs for Music Document Retrieval.
J. Inf. Sci. Eng., 2008

An Investigation of Phonological Feature Systems Used in Detection-Based ASR.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Discriminative Feedback Adaptation for GMM-UBM Speaker Verification.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A comparative study of probabilistic ranking models for spoken document summarization.
Proceedings of the IEEE International Conference on Acoustics, 2008

BIC-based audio segmentation by divide-and-conquer.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation.
IEEE Trans. Speech Audio Process., 2007

Integrating coding techniques into LP-based Mandarin text-to-speech synthesis.
Int. J. Speech Technol., 2007

A Novel Characterization of the Alternative Hypothesis Using Kernel Discriminant Analysis for LLR-Based Speaker Verification.
Int. J. Comput. Linguistics Chin. Lang. Process., 2007

Improved HMM/SVM methods for automatic phoneme segmentation.
Proceedings of the INTERSPEECH 2007, 2007

A unified probabilistic generative framework for extractive spoken document summarization.
Proceedings of the INTERSPEECH 2007, 2007

Evolutionary minimum verification error learning of the alternative hypothesis model for LLR-based speaker verification.
Proceedings of the INTERSPEECH 2007, 2007

Cascading Multimodal Verification using Face, Voice and Iris Information.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Speaker Clustering Based on Minimum Rand Index.
Proceedings of the IEEE International Conference on Acoustics, 2007

Phonetic Boundary Refinement using Support Vector Machine.
Proceedings of the IEEE International Conference on Acoustics, 2007

Improved Methods for Characterizing the Alternative Hypothesis using Minimum Verification Error Training for LLR-Based Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2007

Spoken document summarization using relevant information.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals.
IEEE Trans. Speech Audio Process., 2006

An Empirical Study of Word Error Minimization Approaches for Mandarin Large Vocabulary Continuous Speech Recognition.
Int. J. Comput. Linguistics Chin. Lang. Process., 2006

A Maximum Entropy Approach for Semantic Language Modeling.
Int. J. Comput. Linguistics Chin. Lang. Process., 2006

A Minimum Boundary Error Framework for Automatic Phonetic Segmentation.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Automatic Construction of Regression Class Tree for MLLR Via Model-Based Hierarchical Clustering.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

On Using Entropy Information to Improve Posterior Probability-Based Confidence Measures.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

A Novel Alternative Hypothesis Characterization Using Kernel Classifiers for LLR-Based Speaker Verification.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Minimum boundary error training for automatic phonetic segmentation.
Proceedings of the INTERSPEECH 2006, 2006

Improving the characterization of the alternative hypothesis via kernel discriminant analysis for likelihood ratio-based speaker verification.
Proceedings of the INTERSPEECH 2006, 2006

A Prototypes-Embedded Genetic K-means Algorithm.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

A Kernel-based Discrimination Framework for Solving Hypothesis Testing Problems with Application to Speaker Verification.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

On Maximizing the Within-Cluster Homogeneity of Speaker Voice Characteristics For Speech Utterance Clustering.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

A Music Retrieval System Based on Query-by-Singing for Karaoke Jukebox.
Proceedings of the Information Retrieval Technology, 2006

2005
Fluent speech prosody: Framework and modeling.
Speech Commun., 2005

MATBN: A Mandarin Chinese Broadcast News Corpus.
Int. J. Comput. Linguistics Chin. Lang. Process., 2005

On the extraction of vocal-related information to facilitate the management of popular music collections.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2005

Query-By-Example Technique for Retrieving Cover Versions of Popular Songs with Similar Melodies.
Proceedings of the ISMIR 2005, 2005

Speaker clustering of unknown utterances based on maximum purity estimation.
Proceedings of the INTERSPEECH 2005, 2005

An Efficient Approach to Multimodal Person Identity Verification by Fusing Face and Voice Information.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Prototype Systems for Retrieving Polyphonic Objects of Popular Music Based on Query-by-singing/example.
Proceedings of the 3rd International Conference on Digital Archive Technologies, 2005

SoVideo - A Mandarin Chinese Broadcast Retrieval System.
Proceedings of the 3rd International Conference on Digital Archive Technologies, 2005

Clustering Speech Utterances by Speaker Using Eigenvoice-Motivated Vector Space Models.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Gmm-Based Bhattacharyya Kernel Fisher Discriminant Analysis For Speaker Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

A Query-by-Singing Technique for Retrieving Polyphonic Objects of Popular Music.
Proceedings of the Information Retrieval Technology, 2005

2004
A discriminative HMM/N-gram-based retrieval approach for mandarin spoken documents.
ACM Trans. Asian Lang. Inf. Process., 2004

The SoVideo Mandarin Chinese Broadcast News Retrieval System.
Int. J. Speech Technol., 2004

A Model-Selection-Based Self-Splitting Gaussian Mixture Learning with Application to Speaker Identification.
EURASIP J. Adv. Signal Process., 2004

Mandarin-English Information (MEI): investigating translingual speech retrieval.
Comput. Speech Lang., 2004

Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics.
Comput. Music. J., 2004

藍芽無線環境下中文語音辨識效能之評估與分析 (Performance Evaluation and Analysis of Mandarin Speech Recognition over Bluetooth Communication Environments) [In Chinese].
Proceedings of the 16th Conference on Computational Linguistics and Speech Processing, 2004

Towards Automatic Identification Of Singing Language In Popular Music Recordings.
Proceedings of the ISMIR 2004, 2004

A Mandarin TTS system with an integrated prosodic model.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

A new eigenvoice approach to speaker adaptation.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

A maximum entropy approach for integrating semantic information in statistical language models.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

METRIC-SEQDAC: a hybrid approach for audio segmentation.
Proceedings of the INTERSPEECH 2004, 2004

Speaker clustering of speech utterances using a voice characteristic reference space.
Proceedings of the INTERSPEECH 2004, 2004

Statistical Chinese spoken document retrieval using latent topical information.
Proceedings of the INTERSPEECH 2004, 2004

A query-by-example framework to retrieve music documents by singer.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Automatic detection and tracking of target singer in multi-singer music recordings.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
Blind clustering of popular music recordings based on singer voice characteristics.
Proceedings of the ISMIR 2003, 2003

Automatic singer identification of popular music recordings via estimation and modeling of solo vocal signal.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

A sequential metric-based audio segmentation method via the Bayesian information criterion.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002
Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese.
IEEE Trans. Speech Audio Process., 2002

A hierarchical tag-graph search scheme with layered grammar rules for spontaneous speech understanding.
Pattern Recognit. Lett., 2002

2001
Content-based Language Models for Spoken Document Retrieval.
Int. J. Comput. Process. Orient. Lang., 2001

Comparison of Word and Subword Indexing Techniques for Mandarin Chinese Spoken Document Retrieval.
Proceedings of the Advances in Multimedia Information Processing, 2001

Mandarin-English Information: Investigating Translingual Speech Retrieval.
Proceedings of the First International Conference on Human Language Technology Research, 2001

Comparative analysis for data-driven temporal filters obtained via principal component analysis (PCA) and linear discriminant analysis (LDA) in speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

An HMM/n-gram-based linguistic processing approach for Mandarin spoken document retrieval.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Improved spoken document retrieval by exploring extra acoustic and linguistic cues.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Multi-scale-audio indexing for translingual spoken document retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2001

Eigenspace-based maximum a posteriori linear regression for rapid speaker adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2001

2000
Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese.
Speech Commun., 2000

Mandarin spoken document retrieval based on syllable lattice matching.
Pattern Recognit. Lett., 2000

A spoken-access approach for chinese text and speech information retrieval.
J. Am. Soc. Inf. Sci., 2000

Syllable-Based Chinese Text/Spoken Document Retrieval Using Text/Speech Queries.
Int. J. Pattern Recognit. Artif. Intell., 2000

Browsing the Chinese Web Pages Using Mandarin Speech.
Int. J. Comput. Process. Orient. Lang., 2000

Automatic metric-based speech segmentation for broadcast news via principal component analysis.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Retrieval of mandarin broadcast news using spoken queries.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Fast speaker adaptation using eigenspace-based maximum likelihood linear regression.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Retrieval of broadcast news speech in Mandarin Chinese collected in Taiwan using syllable-level statistical characteristics.
Proceedings of the IEEE International Conference on Acoustics, 2000

1999
Automatic selection of phonetically distributed sentence sets for speaker adaptation with application to large vocabulary Mandarin speech recognition.
Comput. Speech Lang., 1999

A New Syllable-based Approach for Retrieving Mandarin Spoken Documents Using Short Speech Queries.
Proceedings of the 12th Research on Computational Linguistics Conference, 1999

Consistent dialogue across concurrent topics based on an expert system model.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1998
Statistical Analysis of Mandarin Acoustic Units and Automatic Extraction of Phonetically Rich Sentences Based Upon a very Large Chinese Text Corpus.
Int. J. Comput. Linguistics Chin. Lang. Process., 1998

Towards a Mandarin voice memo system.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Hierarchical tag-graph search for spontaneous speech understanding in spoken dialog systems.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

A*-admissible key-phrase spotting with sub-syllable level utterance verification.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

1997
Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data.
IEEE Trans. Speech Audio Process., 1997

Internet Chinese information retrieval using unconstrained Mandarin speech queries based on a client-server architecture and a PAT-tree-based language model.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

1996
Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units.
Speech Commun., 1996

1995
Fast and accurate continuous speech recognition for Chinese language with very large vocabulary.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data.
Proceedings of the 1995 International Conference on Acoustics, 1995

1994
Incremental speaker adaptation using phonetically balanced training sentences for Mandarin syllable recognition based on segmental probability models.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

An initial study on a segmental probability model approach to large-vocabulary continuous Mandarin speech recognition.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

1993
從中文語料庫中自動選取連續國語語音特性平衡句的方法 (Automatic Selection of Phonetically Rich Sentences from A Chinese Text Corpus) [In Chinese].
Proceedings of Rocling Computational Linguistics Conference VI, 1993

Golden Mandarin (II)-an improved single-chip real-time Mandarin dictation machine for Chinese language with very large vocabulary.
Proceedings of the IEEE International Conference on Acoustics, 1993


  Loading...