Chng Eng Siong

Orcid: 0000-0001-6257-7399

Affiliations:
  • Nanyang Technological University, Singapore


According to our database1, Chng Eng Siong authored at least 324 papers between 1994 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Dual-Branch Modeling Based on State-Space Model for Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model.
CoRR, 2024

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators.
CoRR, 2024

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition.
CoRR, 2024

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition.
CoRR, 2024

Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection.
CoRR, 2024

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge.
CoRR, 2024

2023
Noise robust distillation of self-supervised speech models via correlation metrics.
CoRR, 2023

Generative error correction for code-switching speech recognition using large language models.
CoRR, 2023

Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification.
CoRR, 2023

SPGM: Prioritizing Local Features for enhanced speech separation performance.
CoRR, 2023

Are Soft Prompts Good Zero-shot Learners for Speech Recognition?
CoRR, 2023

Codec Data Augmentation for Time-domain Heart Sound Classification.
CoRR, 2023

Noise-aware Speech Enhancement using Diffusion Probabilistic Model.
CoRR, 2023

A Neural State-Space Model Approach to Efficient Speech Separation.
CoRR, 2023

Study of GANs for Noisy Speech Simulation from Clean Speech.
CoRR, 2023

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention.
CoRR, 2023

Noise-aware Speech Separation with Contrastive Learning.
CoRR, 2023

Contrastive Speech Mixup for Low-resource Keyword Spotting.
CoRR, 2023

deHuBERT: Disentangling Noise in a Self-supervised Model for Robust Speech Recognition.
CoRR, 2023

Leveraging Audio-Tagging Assisted Sound Event Detection using Weakified Strong Labels and Frequency Dynamic Convolutions.
Proceedings of the IEEE Statistical Signal Processing Workshop, 2023

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Local and Global Context Modeling with Relation Matching Task for Dialog Act Recognition.
Proceedings of the International Joint Conference on Neural Networks, 2023

Improved Keyword Recognition Based on Aho-Corasick Automaton.
Proceedings of the International Joint Conference on Neural Networks, 2023

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Probabilistic Back-ends for Online Speaker Recognition and Clustering.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Spoken Language Identification with Map-Mix.
Proceedings of the IEEE International Conference on Acoustics, 2023

Contrastive Speech Mixup for Low-Resource Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2023

De'hubert: Disentangling Noise in a Self-Supervised Model for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Unsupervised Noise Adaptation Using Data Simulation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Metric-Oriented Speech Enhancement Using Diffusion Probabilistic Model.
Proceedings of the IEEE International Conference on Acoustics, 2023

Singaporean Conversational English-Malay Code-Switching Speech: An Analysis Based on Code-switching Points and Part -of-Speech.
Proceedings of the International Conference on Asian Language Processing, 2023

CASSI: Contextual and Semantic Structure-based Interpolation Augmentation for Low-Resource NER.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

ASR Model Adaptation for Rare Words Using Synthetic Data Generated by Multiple Text-To-Speech Systems.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Analysis of Speech Separation Performance Degradation on Emotional Speech Mixtures.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Study of Generative Adversarial Networks for Noisy Speech Simulation from Clean Speech.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Adopting Neural Translation Model in Data Generation for Inverse Text Normalization.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Adapting Code-Switching Language Models with Statistical-Based Text Augmentation.
Proceedings of the Intelligent Information and Database Systems - 15th Asian Conference, 2023

An Empirical Study on Punctuation Restoration for English, Mandarin, and Code-Switching Speech.
Proceedings of the Intelligent Information and Database Systems - 15th Asian Conference, 2023

Leveraging Modality-Specific Representations for Audio-Visual Speech Recognition via Reinforcement Learning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition.
Speech Commun., 2022

Efficient Self-Supervised Learning Representations for Spoken Language Identification.
IEEE J. Sel. Top. Signal Process., 2022

Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin.
CoRR, 2022

I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization.
CoRR, 2022

Continual Learning For On-Device Environmental Sound Classification.
CoRR, 2022

Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder.
CoRR, 2022

Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition.
CoRR, 2022

Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss.
CoRR, 2022

Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning.
CoRR, 2022

Small Footprint Multi-channel ConvMixer for Keyword Spotting with Centroid Based Awareness.
CoRR, 2022

Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning.
CoRR, 2022

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition.
CoRR, 2022

The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Rainbow Keywords: Efficient Incremental Learning for Online Spoken Keyword Spotting.
Proceedings of the Interspeech 2022, 2022

Estimation of speaker age and height from speech signal using bi-encoder transformer mixture model.
Proceedings of the Interspeech 2022, 2022

DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition.
Proceedings of the Interspeech 2022, 2022

Interactive Auido-text Representation for Automated Audio Captioning with Contrastive Learning.
Proceedings of the Interspeech 2022, 2022

Speech Emotion Recognition with Co-Attention Based Multi-Level Acoustic Information.
Proceedings of the IEEE International Conference on Acoustics, 2022

An Embarrassingly Simple Model for Dialogue Relation Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2022

Minimum Word Error Training For Non-Autoregressive Transformer-Based Code-Switching ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

Convmixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-Field Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2022

Automated Audio Captioning Using Transfer Learning and Reconstruction Latent Space Similarity Regularization.
Proceedings of the IEEE International Conference on Acoustics, 2022

Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

L-SpEx: Localized Target Speaker Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2022

Noise-Robust Speech Recognition With 10 Minutes Unparalleled In-Domain Data.
Proceedings of the IEEE International Conference on Acoustics, 2022

Self-Critical Sequence Training for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Continual Learning for On-Ddevice Environmental Sound Classification.
Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

2021
Learning Speaker Representation with Semi-supervised Learning approach for Speaker Profiling.
CoRR, 2021

End-to-End Speaker Height and age estimation using Attention Mechanism with LSTM-RNN.
CoRR, 2021

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

E2E-Based Multi-Task Learning Approach to Joint Speech and Accent Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Overlapped Speech Detection Based on Spectral and Spatial Feature Fusion.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Preventing Early Endpointing for Online Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Learning Disentangled Feature Representations for Speech Enhancement Via Adversarial Training.
Proceedings of the IEEE International Conference on Acoustics, 2021

Representation Learning with Spectro-Temporal-Channel Attention for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Multi-Stage Speaker Extraction with Utterance and Frame-Level Reference Signals.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Unified Speaker Adaptation Approach for ASR.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Enriching Under-Represented Named Entities for Improved Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Multitask-based joint learning approach to robust ASR for radio communication speech.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

End-to-End Speaker Age and Height Estimation using Attention Mechanism and Triplet Loss.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Time Domain Speech Enhancement With Attentive Multi-scale Approach.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

GDPNet: Refining Latent Multi-View Graph for Relation Extraction.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
SpEx: Multi-Scale Time Domain Speaker Extraction Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

An Embarrassingly Simple Model for Dialogue Relation Extraction.
CoRR, 2020

Enriching Under-Represented Named-Entities To Improve Speech Recognition Performance.
CoRR, 2020

A multilingual approach to joint Speech and Accent Recognition with DNN-HMM framework.
CoRR, 2020

Cross Attention with Monotonic Alignment for Speech Transformer.
Proceedings of the Interspeech 2020, 2020

Universal Speech Transformer.
Proceedings of the Interspeech 2020, 2020

Speech Transformer with Speaker Aware Persistent Memory.
Proceedings of the Interspeech 2020, 2020

Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-Switching Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Multi-Task Learning for End-to-End Noise-Robust Bandwidth Extension.
Proceedings of the Interspeech 2020, 2020

Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network.
Proceedings of the Interspeech 2020, 2020

SpEx+: A Complete Time Domain Speaker Extraction Network.
Proceedings of the Interspeech 2020, 2020

Independent Language Modeling Architecture for End-To-End ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Time-Domain Neural Network Approach for Speech Bandwidth Extension.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

2019
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.
CoRR, 2019

A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data.
CoRR, 2019

Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification.
CoRR, 2019

Online FAQ Chatbot for Customer Support.
Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019

On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition.
Proceedings of the Interspeech 2019, 2019

A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data.
Proceedings of the Interspeech 2019, 2019

Target Speaker Extraction for Multi-Talker Speaker Verification.
Proceedings of the Interspeech 2019, 2019

Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation.
Proceedings of the Interspeech 2019, 2019

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data.
Proceedings of the Interspeech 2019, 2019

QASA: Advanced Document Retriever for Open-Domain Question Answering by Learning to Rank Question-Aware Self-Attentive Document Representations.
Proceedings of the 3rd International Conference on Machine Learning and Soft Computing, 2019

Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss.
Proceedings of the IEEE International Conference on Acoustics, 2019

Time-Domain Speaker Extraction Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Audio Codec Simulation based Data Augmentation for Telephony Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Transfer Learning for Punctuation Prediction.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Improving code-switching speech recognition with data augmentation and system combination.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Domain Adversarial Training for Speech Enhancement.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Re-ranking spoken term detection with acoustic exemplars of keywords.
Speech Commun., 2018

Learning distributed sentence representations for story segmentation.
Signal Process., 2018

Average Modeling Approach to Voice Conversion with Non-Parallel Data.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Shifted Delta Coefficient Objective for Monaural Speech Separation Using Multi-task Learning.
Proceedings of the Interspeech 2018, 2018

Mandarin-English Code-switching Speech Recognition.
Proceedings of the Interspeech 2018, 2018

Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR.
Proceedings of the Interspeech 2018, 2018

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition.
Proceedings of the Interspeech 2018, 2018

Single Channel Speech Separation with Constrained Utterance Level Permutation Invariant Training Using Grid LSTM.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Investigation of Word Embeddings with Deep Bidirectional LSTM for Sentence Unit Detection in Automatic Speech Transcription.
Proceedings of the 2018 International Conference on Asian Language Processing, 2018

A Hybrid Deep Learning Architecture for Sentence Unit Detection.
Proceedings of the 2018 International Conference on Asian Language Processing, 2018

Named-Entity Tagging and Domain adaptation for Better Customized Translation.
Proceedings of the Seventh Named Entities Workshop, 2018

2017
An Exemplar-Based Approach to Frequency Warping for Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A hybrid neural network hidden Markov model approach for automatic story segmentation.
J. Ambient Intell. Humaniz. Comput., 2017

Pruning Strategies for Partial Search in Spoken Term Detection.
Proceedings of the Eighth International Symposium on Information and Communication Technology, 2017

Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source.
Proceedings of the Interspeech 2017, 2017


Towards Age-friendly E-commerce Through Crowd-Improved Speech Recognition, Multimodal Search, and Personalized Speech Feedback.
Proceedings of the 2nd International Conference on Crowd Science and Engineering, 2017

On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Named entity transliteration with sequence-to-sequence neural network.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

A review of the mandarin-english code-switching corpus: SEAME.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Improving air traffic control speech intelligibility by reducing speaking rate effectively.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017

Novel Functional Technologies for Age-Friendly E-commerce.
Proceedings of the Human Aspects of IT for the Aged Population. Applications, Services and Contexts, 2017

Improving N-gram language modeling for code-switching speech recognition.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Topic embedding of sentences for story segmentation.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

An end-to-end neural network approach to story segmentation.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

An investigation of spectral feature partitioning for replay attacks detection.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Low-resource spoken keyword search strategies in georgian inspired by distinctive feature theory.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Unsupervised Language Model Adaptation by Data Selection for Speech Recognition.
Proceedings of the Intelligent Information and Database Systems - 9th Asian Conference, 2017

2016
Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization.
J. Signal Process. Syst., 2016

Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

High quality voice conversion using prosodic and high-resolution spectral features.
Multim. Tools Appl., 2016

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation.
EURASIP J. Adv. Signal Process., 2016

Noise Robust Speech Recognition Using Multi-Channel Based Channel Selection And ChannelWeighting.
CoRR, 2016

Spoofing detection under noisy conditions: a preliminary investigation and an initial database.
CoRR, 2016

The NNI Vietnamese Speech Recognition System for MediaEval 2016.
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

Multi-channel feature adaptation for robust speech recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Neural networks based channel compensation for i-vector speaker verification.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

A DNN-HMM Approach to Story Segmentation.
Proceedings of the Interspeech 2016, 2016

Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource Conditions.
Proceedings of the Interspeech 2016, 2016

An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions.
Proceedings of the Interspeech 2016, 2016

Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword Samples.
Proceedings of the Interspeech 2016, 2016

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis.
Proceedings of the Interspeech 2016, 2016

The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMS.
Proceedings of the Interspeech 2016, 2016

Approximate search of audio queries by using DTW with phone time boundary and data augmentation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Combining non-negative matrix factorization and deep neural networks for speech enhancement and automatic speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Spoofing detection from a feature representation perspective.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Keyword search using query expansion for graph-based rescoring of hypothesized detections.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exemplar-inspired strategies for low-resource spoken keyword search in Swahili.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Content-aware local variability vector for speaker verification with short utterance.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

I-vector based deep neural network acoustic model adaptation using multilingual language resource.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Beamforming networks using spatial covariance features for far-field speech recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Spoofing speech detection using temporal convolutional neural network.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Zero resource anti-spoofing detection for unit selection based synthetic speech using image spectrogram artifacts.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Improving Efficiency of Sentence Boundary Detection by Feature Selection.
Proceedings of the Intelligent Information and Database Systems - 8th Asian Conference, 2016

2015
Decoupling Word-Pair Distance and Co-occurrence Information for Effective Long History Context Language Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Exemplar-based voice conversion using joint nonnegative matrix factorization.
Multim. Tools Appl., 2015

Mandarin-English code-switching speech corpus in South-East Asia: SEAME.
Lang. Resour. Evaluation, 2015

Context-dependent Phone Mapping for Acoustic Modeling of Under-resourced Languages.
Int. J. Asian Lang. Process., 2015

The NNI Query-by-Example System for MediaEval 2015.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Regularized non-negative matrix factorization using alternating direction method of multipliers and its application to source separation.
Proceedings of the INTERSPEECH 2015, 2015

A comparative study of BNF and DNN multilingual training on cross-lingual low-resource speech recognition.
Proceedings of the INTERSPEECH 2015, 2015

Learning to estimate reverberation time in noisy and reverberant rooms.
Proceedings of the INTERSPEECH 2015, 2015

Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge.
Proceedings of the INTERSPEECH 2015, 2015

System fusion for high-performance voice conversion.
Proceedings of the INTERSPEECH 2015, 2015

TDTO language modeling with feedforward neural networks.
Proceedings of the INTERSPEECH 2015, 2015

Language independent query-by-example spoken term detection using N-best phone sequences and partial matching.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A learning-based approach to direction of arrival estimation in noisy and reverberant environments.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Sparse representation for frequency warping based voice conversion.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Language-resource independent speech segmentation using cues from a spectrogram image.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Low-resource keyword search strategies for tamil.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Modelling Public Sentiment in Twitter: Using Linguistic Patterns to Enhance Supervised Learning.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2015

On statistical machine translation method for lexicon refinement in speech recognition.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Detecting synthetic speech using long term magnitude and phase information.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

DNN feature compensation for noise robust speaker verification.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reduction.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Non-negative matrix factorization using stable alternating direction method of multipliers for source separation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

A density peak clustering approach to unsupervised acoustic subword units discovery.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

On the study of very low-resource language keyword search.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Multilingual exemplar-based acoustic model for the NIST Open KWS 2015 evaluation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Distance metric learning for kernel density-based acoustic model under limited training data conditions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014
Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages.
IEICE Trans. Inf. Syst., 2014

System and keyword dependent fusion for spoken term detection.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

The NNI Query-by-Example System for MediaEval 2014.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Correlation-based frequency warping for voice conversion.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A deep neural network approach for sentence boundary detection in broadcast news.
Proceedings of the INTERSPEECH 2014, 2014

Semi-supervised training for bottle-neck feature based DNN-HMM hybrid systems.
Proceedings of the INTERSPEECH 2014, 2014

Joint nonnegative matrix factorization for exemplar-based voice conversion.
Proceedings of the INTERSPEECH 2014, 2014

Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR.
Proceedings of the INTERSPEECH 2014, 2014

Analysis of spectrogram image methods for sound event classification.
Proceedings of the INTERSPEECH 2014, 2014

Feature compensation using linear combination of speaker and environment dependent correction vectors.
Proceedings of the IEEE International Conference on Acoustics, 2014

Discriminative score normalization for keyword search decision.
Proceedings of the IEEE International Conference on Acoustics, 2014

Generalization of temporal filter and linear transformation for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

A discriminatively trained Hough Transform for frame-level phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Improving language modeling by using distance and co-occurrence information of word-pairs and its application to LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2014

A Bayesian performance bound for time-delay of arrival based acoustic source tracking in a reverberant environment.
Proceedings of the 17th International Conference on Information Fusion, 2014

Towards better keyword search performance on Malay broadcast news data.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

A study on replay attack and anti-spoofing for text-dependent speaker verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification.
IEEE Trans. Speech Audio Process., 2013

Hadoop framework: impact of data organization on performance.
Softw. Pract. Exp., 2013

Overlapping sound event recognition using local spectrogram features and the generalised hough transform.
Pattern Recognit. Lett., 2013

Exemplar-based voice conversion using non-negative spectrogram deconvolution.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

The development and analysis of a Malay broadcasr news corpus.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Attribute-based histogram equalization (HEQ) and its adaptation for robust speech recognition.
Proceedings of the INTERSPEECH 2013, 2013

Exemplar-based unit selection for voice conversion utilizing temporal information.
Proceedings of the INTERSPEECH 2013, 2013

Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints.
Proceedings of the INTERSPEECH 2013, 2013

Context-dependent phone mapping for LVCSR of under-resourced languages.
Proceedings of the INTERSPEECH 2013, 2013

Temporal filter design by minimum KL divergence criterion for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Synthetic speech detection using temporal modulation feature.
Proceedings of the IEEE International Conference on Acoustics, 2013

Language diarization for code-switch conversational speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Constrained adaptation of histogram equalization for robust speech recognition.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Conditional restricted Boltzmann machine for voice conversion.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Robust sound event recognition under TV playing conditions.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Language diarization for conversational code-switch speech with pronunciation dictionary adaptation.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Local partial least square regression for spectral mapping in voice conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

A particle filter compensation approach to robust LVCSR.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Adaptive semi-supervised tree SVM for sound event recognition in home environments.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

A robust sound event recognition framework under TV playing conditions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Modeling of term-distance and term-occurrence information for improving n-gram language model performance.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
Mixture of Factor Analyzers Using Priors From Non-Parallel Speech for Voice Conversion.
IEEE Signal Process. Lett., 2012

Discriminative feature extraction for speech recognition using continuous output codes.
Pattern Recognit. Lett., 2012

Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features.
IEICE Trans. Inf. Syst., 2012

Integration of language identification into a recognition system for spoken conversations containing code-Switches.
Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

An analysis of vector Taylor series model compensation for non-stationary noise in speech recognition.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Context dependant phone mapping for cross-lingual acoustic modeling.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition.
Proceedings of the INTERSPEECH 2012, 2012

Overlapping Sound Event Recognition using Local Spectrogram Features with the Generalised Hough Transform.
Proceedings of the INTERSPEECH 2012, 2012

Lasso environment model combination for robust speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A first speech recognition system for Mandarin-English code-switch conversational speech.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A Phone Mapping Technique for Acoustic Modeling of Under-Resourced Languages.
Proceedings of the 2012 International Conference on Asian Language Processing, 2012

An Empirical Evaluation of Stop Word Removal in Statistical Machine Translation.
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation HyTra@EACL 2012, 2012

A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
Error Corrective Fusion of Classifier Scores for Spoken Language Recognition.
IEICE Trans. Inf. Syst., 2011

Feature Normalization Using Structured Full Transforms for Robust Speech Recognition.
Proceedings of the INTERSPEECH 2011, 2011

Target-Aware Lattice Rescoring for Dialect Recognition.
Proceedings of the INTERSPEECH 2011, 2011

Speech Modulation Features for Robust Nonnative Speech Accent Detection.
Proceedings of the INTERSPEECH 2011, 2011

Linear Dynamic Models for Voice Activity Detection.
Proceedings of the INTERSPEECH 2011, 2011

Maximum likelihood adaptation of histogram equalization with constraint for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2010

A tree-construction search approach for multivariate time series motifs discovery.
Pattern Recognit. Lett., 2010

Text-independent F0 transformation with non-parallel data for voice conversion.
Proceedings of the INTERSPEECH 2010, 2010

Phoneme lattice based texttiling towards multilingual story segmentation.
Proceedings of the INTERSPEECH 2010, 2010

Selecting phonotactic features for language recognition.
Proceedings of the INTERSPEECH 2010, 2010

SEAME: a Mandarin-English code-switching speech corpus in south-east asia.
Proceedings of the INTERSPEECH 2010, 2010

A discriminative performance metric for GMM-UBM speaker identification.
Proceedings of the INTERSPEECH 2010, 2010

Framewise Phone Classification Using Weighted Fuzzy Classification Rules.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Error corrective classifier fusion for spoken Language Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Non-Isomorphic Forest Pair Translation.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010

2009
A Target-Oriented Phonotactic Front-End for Spoken Language Recognition.
IEEE Trans. Speech Audio Process., 2009

Improved Keypoint Matching Method for Near-Duplicate Keyframe Retrieval.
Proceedings of the 11th IEEE International Symposium on Multimedia, 2009

Target-aware language models for spoken language recognition.
Proceedings of the INTERSPEECH 2009, 2009

Discriminative feature transformation using output coding for speech recognition.
Proceedings of the INTERSPEECH 2009, 2009

Efficient sparse self-similarity matrix construction for repeating sequence detection.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Cluster criterion functions in spectral subspace and their application in speaker clustering.
Proceedings of the IEEE International Conference on Acoustics, 2009

Exploiting prosodic information for Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009


A study on hidden Markov model's generalization capability for speech recognition.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
Normalization of the Speech Modulation Spectra for Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2008

Automatic composition of broadcast sports video.
Multim. Syst., 2008

Efficient mobile phone Chinese optical character recognition systems by use of heuristic fuzzy rules and bigram Markov language models.
Appl. Soft Comput., 2008

Effect of Feature Smoothing for Robust Speech Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Discriminative Output Coding Features for Speech Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Target-oriented phone selection from universal phone set for spoken language recognition.
Proceedings of the INTERSPEECH 2008, 2008

T-test distance and clustering criterion for speaker diarization.
Proceedings of the INTERSPEECH 2008, 2008

Fuzzy rule selection using Iterative Rule Learning for speech data classification.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Target-oriented phone tokenizers for spoken language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

MICRO-EBLOCK: A Modular Platform for Embedded System Education.
Proceedings of the International Conference on Computer Science and Software Engineering, 2008

2007
Generation of Personalized Music Sports Video Using Multimodal Cues.
IEEE Trans. Multim., 2007

Temporal Structure Normalization of Speech Feature for Robust Speech Recognition.
IEEE Signal Process. Lett., 2007

Evaluating the temporal structure normalisation technique on the Aurora-4 task.
Proceedings of the INTERSPEECH 2007, 2007

Using direction of arrival estimate and acoustic feature information in speaker diarization.
Proceedings of the INTERSPEECH 2007, 2007

An MCU description methodology for initialization code generation software.
Proceedings of the 13th International Conference on Parallel and Distributed Systems, 2007

A Vector-Based Approach to Broadcast Audio Database Indexing and Retrieval.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Normalizing the Speech Modulation Spectrum for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

Spoken Language Recognition with Relevance Feedback.
Proceedings of the IEEE International Conference on Acoustics, 2007

Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation.
Proceedings of the Multimodal Technologies for Perception of Humans, 2007

2006
Vector Autoregressive Model for Missing Feature Reconstruction.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Fusion of Acoustic and Tokenization Features for Speaker Recognition.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

The IIR Submission to CSLP 2006 Speaker Recognition Evaluation.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Automatic Sports Video Genre Classification using Pseudo-2D-HMM.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Identify Sports Video Shots with "Happy" or "Sad" Emotions.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Fully and Semi-Automatic Music Sports Video Composition.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Integrating Acoustic, Prosodic and Phonotactic Features for Spoken Language Identification.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Determining the optimal decision delay parameter for a linear equalizer.
Int. J. Autom. Comput., 2005

Automatic generation of personalized music sports video.
Proceedings of the 13th ACM International Conference on Multimedia, 2005

A Player-Possession Acquisition System for Broadcast Soccer Video.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Soccer replay detection using scene transition structure analysis.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Automatic replay generation for soccer video broadcasting.
Proceedings of the 12th ACM International Conference on Multimedia, 2004

High Accuracy Classification of EEG Signal.
Proceedings of the 17th International Conference on Pattern Recognition, 2004

Sports highlight detection from keyword sequences using HMM.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Event detection based on non-broadcast sports video.
Proceedings of the 2004 International Conference on Image Processing, 2004

Concurrent constant modulus algorithm and soft decision directed scheme for fractionally-spaced blind equalization.
Proceedings of IEEE International Conference on Communications, 2004

1996
Gradient radial basis function networks for nonlinear and nonstationary time series prediction.
IEEE Trans. Neural Networks, 1996

Orthogonal least-squares learning algorithm with local adaptation process for the radial basis function networks.
IEEE Signal Process. Lett., 1996

Using weight decay to optimize the generalization ability of a perceptron.
Proceedings of International Conference on Neural Networks (ICNN'96), 1996

1995
Efficient computational schemes for the orthogonal least squares algorithm.
IEEE Trans. Signal Process., 1995

1994
Reducing the computational requirement of the orthogonal least squares algorithm.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994


  Loading...