Sabato Marco Siniscalchi

CoRR, April, 2026

A Knowledge-Driven Approach to Music Segmentation, Music Source Separation and Cinematic Audio Source Separation.

[BibT_eX]

[DOI]

Chun-Wei Ho

Kai Li

CoRR, February, 2026

2025

Hallucination Benchmark for Speech Foundation Models.

[BibT_eX]

[DOI]

Manuel Giollo

CoRR, October, 2025

TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models.

[BibT_eX]

[DOI]

Ming Jin

Shirui Pan

CoRR, September, 2025

Cross-Modal Knowledge Distillation with Multi-Level Data Augmentation for Low-Resource Audio-Visual Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Qing Wang

Ya Jiang

Hang Chen

Jianqing Gao

CoRR, August, 2025

Lightweight Audio-Visual Wake Word Spotting With Diverse Acoustic Knowledge Distillation.

[BibT_eX]

[DOI]

Shutong Niu

Shifu Xiong

IEEE Trans. Circuits Syst. Video Technol., July, 2025

Aligning Generative Speech Enhancement with Human Preferences via Direct Preference Optimization.

[BibT_eX]

[DOI]

Eng Siong Chng

CoRR, July, 2025

Joint Tensor-Train Parameterization for Efficient and Expressive Low-Rank Adaptation.

[BibT_eX]

[DOI]

Chen-Yu Liu

Min-Hsiu Hsieh

CoRR, June, 2025

HPCNet: Hybrid Pixel and Contour Network for Audio-Visual Speech Enhancement With Low-Quality Video.

[BibT_eX]

[DOI]

Shifu Xiong

Genshun Wan

IEEE J. Sel. Top. Signal Process., May, 2025

Towards Robust Assessment of Pathological Voices via Combined Low-Level Descriptors and Foundation Model Representations.

[BibT_eX]

[DOI]

Whenty Ariyanti

Kuan-Yu Chen

CoRR, May, 2025

Variational Bayesian Adaptive Learning of Deep Latent Variables for Acoustic Knowledge Transfer.

[BibT_eX]

[DOI]

CoRR, January, 2025

S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction.

[BibT_eX]

[DOI]

Mohammad Adiban

Kalin Stefanov

IEEE Trans. Multim., 2025

Controllable Conformer for Speech Enhancement and Recognition.

[BibT_eX]

[DOI]

Zilu Guo

Jia Pan

Qingfeng Liu

IEEE Signal Process. Lett., 2025

Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement.

[BibT_eX]

[DOI]

Genshun Wan

Jia Pan

Huijun Ding

Inf. Fusion, 2025

Foundation Models for Speech Enhancement Leveraging Consistency Constraints and Contrast Stretching.

[BibT_eX]

[DOI]

Muhammad Salman Khan

IEEE Access, 2025

Using Cross-Attention for Conversational ASR over the Telephone.

[BibT_eX]

[DOI]

Simen Dymbe

Proceedings of the Text, Speech, and Dialogue - 28th International Conference, 2025

Feature Importance across Domains for Improving Non-Intrusive Speech Intelligibility Prediction in Hearing Aids.

[BibT_eX]

[DOI]

Ryandhimas E. Zezario

Fei Chen

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Exploring Generative Error Correction for Dysarthric Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology.

[BibT_eX]

[DOI]

Haoyang Li

Yuchen Hu

Chen Chen

Songting Liu

Eng Siong Chng

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

"KAN you hear me?" Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding.

[BibT_eX]

[DOI]

Eliana Pastor

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

MVP: Multi-source Voice Pathology detection.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition.

[BibT_eX]

[DOI]

Cristian David Ríos-Urrego

Odette Scharenborg

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Synchronous analysis of abnormal acoustic and linguistic production in Parkinson's speech.

[BibT_eX]

[DOI]

Daniel Escobar-Grisales

Juan Rafael Orozco-Arroyave

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models.

[BibT_eX]

[DOI]

Ryandhimas E. Zezario

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Bilingual Dual-Head Deep Model for Parkinson's Disease Detection from Speech.

[BibT_eX]

[DOI]

Juan Rafael Orozco-Arroyave

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MSEMG: Surface Electromyography Denoising with a Mamba-based Efficient Network.

[BibT_eX]

[DOI]

Yu-Tung Liu

Kuan-Chen Wang

Rong Chao

Ping-Cheng Yeh

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement.

[BibT_eX]

[DOI]

Pin-Jui Ku

Chun-Wei Ho

Hao Yen

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

voc2vec: A Foundation Model for Non-Verbal Vocalization.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization.

[BibT_eX]

[DOI]

HangChen HangChen

Jia-Chen Gu

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

How word semantics and phonology affect handwriting of Alzheimer's patients: A machine learning based analysis.

[BibT_eX]

[DOI]

Nicole Dalia Cilia

Claudio De Stefano

Francesco Fontanella

Comput. Biol. Medicine, February, 2024

Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori.

[BibT_eX]

[DOI]

CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines For Speech Recognition, Speaker Tagging, and Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

FlanEC: Exploring Flan-T5 for Post-ASR Error Correction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

An Investigation of Incorporating Mamba For Speech Enhancement.

[BibT_eX]

[DOI]

Rong Chao

Wen-Huang Cheng

Szu-Wei Fu

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-Based Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Workshop on Multimedia Signal Processing, 2024

Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition.

[BibT_eX]

[DOI]

Hao Yen

Pin-Jui Ku

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions.

[BibT_eX]

[DOI]

Maria Francesca Turco

Juan Rafael Orozco-Arroyave

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Summary on the Chat-Scenario Chinese Lipreading (ChatCLR) Challenge.

[BibT_eX]

[DOI]

Chen-Yue Zhang

Hang Chen

Ya Jiang

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Chen Chen

Ruizhe Li

Yuchen Hu

Engsiong Chng

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Boosting End-to-End Multilingual Phoneme Recognition Through Exploiting Universal Speech Attributes Constraints.

[BibT_eX]

[DOI]

Hao Yen

Proceedings of the IEEE International Conference on Acoustics, 2024

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Benchmarking Representations for Speech, Music, and Acoustic Events.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Speech Analysis of Language Varieties in Italy.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023

A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity.

[BibT_eX]

[DOI]

Mohammad Adiban

Neurocomputing, June, 2023

Generative error correction for code-switching speech recognition using large language models.

[BibT_eX]

[DOI]

Eng Siong Chng

CoRR, 2023

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.

[BibT_eX]

[DOI]

CoRR, 2023

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models.

[BibT_eX]

[DOI]

Chen Chen

Yuchen Hu

Chng Eng Siong

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Inference and Denoise: Causal Inference-Based Neural Speech Enhancement.

[BibT_eX]

[DOI]

Tsun-An Hsieh

Proceedings of the 33rd IEEE International Workshop on Machine Learning for Signal Processing, 2023

Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Description and analysis of the KPT system for NIST Language Recognition Evaluation 2022.

[BibT_eX]

[DOI]

Salvatore Sarni

Sandro Cumani

Andrea Bottino

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models.

[BibT_eX]

[DOI]

Pin-Jui Ku

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Differentially Private Adapters for Parameter Efficient Acoustic Modeling.

[BibT_eX]

[DOI]

Chun-Wei Ho

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Andreas Stolcke

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

A Study on Joint Modeling and Data Augmentation of Multi-Modalities for Audio-Visual Scene Classification.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Hierarchical Residual Learning Based Vector Quantized Variational Autoencoder for Image Reconstruction and Generation.

[BibT_eX]

[DOI]

Mohammad Adiban

Kalin Stefanov

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Vector-to-Vector Regression via Distributional Loss for Speech Enhancement.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

A multimodal retina-iris biometric system using the Levenshtein distance for spatial feature comparison.

[BibT_eX]

[DOI]

Vincenzo Conti

Leonardo Rundo

Carmelo Militello

IET Biom., 2021

A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming.

[BibT_eX]

[DOI]

CoRR, 2021

A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification.

[BibT_eX]

[DOI]

CoRR, 2021

A DNN Based Speech Enhancement Approach to Noise Robust Acoustic-to-Articulatory Inversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Raw Speech-to-Articulatory Inversion by Temporal Filtering and Decimation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Xiaoli Ma

Proceedings of the IEEE International Conference on Acoustics, 2021

A Two-Stage Deep Modeling Approach to Articulatory Inversion.

[BibT_eX]

[DOI]

Negar Olfati

Ali Shariq Imran

Magne Hallstein Johnsen

Proceedings of the IEEE International Conference on Acoustics, 2021

A Two-Stage Approach to Device-Robust Acoustic Scene Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network-Based Vector-to-Vector Regression.

[BibT_eX]

[DOI]

Xiaoli Ma

IEEE Trans. Signal Process., 2020

Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition.

[BibT_eX]

[DOI]

Ivan Kukanov

Trung Ngo Trong

Kong Aik Lee

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Ensemble Hierarchical Extreme Learning Machine for Speech Dereverberation.

[BibT_eX]

[DOI]

Tassadaq Hussain

Hsiao-Lan Sharon Wang

IEEE Trans. Cogn. Dev. Syst., 2020

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression.

[BibT_eX]

[DOI]

Xiaoli Ma

IEEE Signal Process. Lett., 2020

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation.

[BibT_eX]

[DOI]

CoRR, 2020

Sequence-to-Sequence Articulatory Inversion Through Time Convolution of Sub-Band Frequency Signals.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transfer Learning of Articulatory Information Through Phone Information.

[BibT_eX]

[DOI]

Negar Olfati

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers.

[BibT_eX]

[DOI]

Sicheng Wang

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Performance Analysis for Tensor-Train Decomposition to Deep Neural Network Based Vector-to-Vector Regression.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Conference on Information Sciences and Systems, 2020

2019

A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Improving Mispronunciation Detection of Mandarin Tones for Non-Native Learners With Soft-Target Tone Labels and BLSTM-Based Deep Tone Models.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Bone-Conducted Speech Enhancement Using Hierarchical Extreme Learning Machine.

[BibT_eX]

[DOI]

Tassadaq Hussain

Jia-Ching Wang

Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019

A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion.

[BibT_eX]

[DOI]

Negar Olfati

Ali Shariq Imran

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Audio-visual Speech Recognition Performance with Cross-modal Student-teacher Training.

[BibT_eX]

[DOI]

Sicheng Wang

Ming Lei

Proceedings of the IEEE International Conference on Acoustics, 2019

Exploring Retraining-free Speech Recognition for Intra-sentential Code-switching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Audio-Visual Speech Enhancement using Hierarchical Extreme Learning Machine.

[BibT_eX]

[DOI]

Proceedings of the 27th European Signal Processing Conference, 2019

Compressed Multimodal Hierarchical Extreme Learning Machine for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks.

[BibT_eX]

[DOI]

Jinsong Zhang

J. Signal Process. Syst., 2018

Improving Mandarin Tone Mispronunciation Detection for Non-Native Learners with Soft-Target Tone Labels and BLSTM-Based Deep Models.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Adaptation to New Microphones Using Artificial Neural Networks With Trainable Activation Functions.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., 2017

Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2017

An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2017

A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation.

[BibT_eX]

[DOI]

Tong Wang

EURASIP J. Adv. Signal Process., 2017

Experimental Study on Extreme Learning Machine Applications for Speech Enhancement.

[BibT_eX]

[DOI]

Tassadaq Hussain

IEEE Access, 2017

Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Joint Training of Multi-Channel-Condition Dereverberation and Acoustic Modeling of Microphone Array Speech for Robust Distant Speech Recognition.

[BibT_eX]

[DOI]

Fengpei Ge

Bo Wu

Yonghong Yan

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A transfer learning and progressive stacking approach to reducing deep model sizes with an application to speech enhancement.

[BibT_eX]

[DOI]

Sicheng Wang

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A unified deep modeling approach to simultaneous speech dereverberation and recognition for the reverb challenge.

[BibT_eX]

[DOI]

Bo Wu

Minglei Yang

Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

2016

i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition.

[BibT_eX]

[DOI]

Tomi Kinnunen

IEEE ACM Trans. Audio Speech Lang. Process., 2016

A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition.

[BibT_eX]

[DOI]

Neurocomputing, 2016

Deep learning with maximal figure-of-merit cost to advance multi-label speech attribute detection.

[BibT_eX]

[DOI]

Ivan Kukanov

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Towards a direct Bayesian adaptation framework for deep models.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015

Maximum a Posteriori Adaptation of Network Parameters in Deep Models.

[BibT_eX]

[DOI]

Jiadong Wu

CoRR, 2015

Maximum a posteriori adaptation of network parameters in deep models.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Rapid adaptation for deep neural networks through multi-task learning.

[BibT_eX]

[DOI]

Ji Wu

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Boosting universal speech attributes classification with deep neural network for foreign accent characterization.

[BibT_eX]

[DOI]

Ivan Kukanov

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

An artificial neural network approach to automatic speech processing.

[BibT_eX]

[DOI]

Neurocomputing, 2014

Feature space maximum a posteriori linear regression for adaptation of deep neural networks.

[BibT_eX]

[DOI]

Chao Weng

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Dialect levelling in Finnish: a universal speech attribute approach.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Attribute based lattice rescoring in spontaneous speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Introducing attribute features to foreign accent recognition.

[BibT_eX]

[DOI]

Tomi Kinnunen

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

A Bottom-Up Modular Search Approach to Large Vocabulary Continuous Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model.

[BibT_eX]

[DOI]

Dong Yu

Li Deng

IEEE Signal Process. Lett., 2013

An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition.

[BibT_eX]

[DOI]

Proc. IEEE, 2013

Exploiting deep neural networks for detection-based speech recognition.

[BibT_eX]

[DOI]

Dong Yu

Li Deng

Neurocomputing, 2013

Model-based margin estimation for hidden Markov model learning and generalisation.

[BibT_eX]

[DOI]

IET Signal Process., 2013

Universal attribute characterization of spoken languages for automatic spoken language recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2013

Knowledge integration for improving performance in LVCSR.

[BibT_eX]

[DOI]

Chen-Yu Chiang

Sin-Horng Chen

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An experimental study on structural-MAP approaches to implementing very large vocabulary speech recognition systems for real-world tasks.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012

Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data.

[BibT_eX]

[DOI]

Dau-Cheng Lyu

IEEE Trans. Speech Audio Process., 2012

Combining speech attribute detection and penalized logistic regression for phoneme recognition.

[BibT_eX]

[DOI]

Neurocomputing, 2012

A new confidence measure combining Hidden Markov Models and Artificial Neural Networks of phonemes for effective keyword spotting.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A study on cross-language knowledge integration in Mandarin LVCSR.

[BibT_eX]

[DOI]

Chen-Yu Chiang

Yih-Ru Wang

Sin-Horng Chen

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Hermitian based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Consumer-level multimedia event detection through unsupervised audio signal modeling.

[BibT_eX]

[DOI]

Byungki Byun

Ilseo Kim

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition.

[BibT_eX]

[DOI]

Dong Yu

Li Deng

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

A Bottom-Up Stepwise Knowledge-Integration Approach to Large Vocabulary Continuous Speech Recognition Using Weighted Finite State Machines.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Bootstrapping a spoken language identification system using unsupervised integrated sensing and processing decision trees.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Penalized Logistic Regression With HMM Log-Likelihood Regressors for Speech Recognition.

[BibT_eX]

[DOI]

Øystein Birkenes

Tomoko Matsui

Kunio Tanabe

Tor André Myrvoll

Magne Hallstein Johnsen

IEEE Trans. Speech Audio Process., 2010

A survey on recent progress in the ASAT/SIRKUS paradigm.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Exploiting context-dependency and acoustic resolution of universal speech attribute models in spoken language recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Experimental studies on continuous speech recognition using neural architectures with "adaptive" hidden activation functions.

[BibT_eX]

[DOI]

Filippo Sorbello

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2009

Minimum Classification Error Training to Improve Isolated Chord Recognition.

[BibT_eX]

[DOI]

Yushi Ueda

Yuuki Uchiyama

Shigeki Sagayama

Proceedings of the 10th International Society for Music Information Retrieval Conference, 2009

Exploring universal attribute characterization of spoken languages for spoken language recognition.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A phonetic feature based lattice rescoring approach to LVCSR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

A penalized logistic regression approach to detection based phone classification.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Continuous phone recognition without target language training data.

[BibT_eX]

[DOI]

Dau-Cheng Lyu

Tae-Yoon Kim

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Toward a detector-based universal phone recognizer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Detection-based ASR in the automatic speech attribute transcription project.

[BibT_eX]

[DOI]

Antonio Moreno-Daniel

Jeremy Morris

Yu Wang

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

High-Accuracy Phone Recognition By Combining High-Performance Lattice Generation and Knowledge Based Rescoring.

[BibT_eX]

[DOI]

Petr Schwarz

Proceedings of the IEEE International Conference on Acoustics, 2007

Approximate Test Risk Minimization Through Soft Margin Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Towards bottom-up continuous phone recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

Riconoscimento del parlato basato su tecniche di soppressione del rumore o di integrazione della conoscenza articolatoria.

[BibT_eX]

[DOI]

PhD thesis, 2006

A study on lattice rescoring with knowledge scores for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Embedded Knowledge-Based Speech Detectors for Real-Time Recognition Tasks.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Parallel Processing Workshops (ICPP Workshops 2006), 2006

A Study of Perceptron Mapping Capability to Design Speech Event Detectors.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Noise Robust Aurora-2 Speech Recognition Employing a Codebook-Constrained Kalman Filter Preprocessor.

[BibT_eX]

[DOI]

Venkatesh Krishnan

David V. Anderson

Mark A. Clements

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Application of E<i>alpha</i>Nets to Feature Recognition of Articulation Manner in Knowledge-Based Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Neural Nets, 16th Italian Workshop on Neural Nets, 2005

Efficient FPGA Implementation of a Knowledge-Based Automatic Speech Classifier.

[BibT_eX]

[DOI]

Proceedings of the Embedded Software and Systems, Second International Conference, 2005

2004

Neural Classification of HEP Experimental Data.

[BibT_eX]

[DOI]

Giovanni Pilato

Giorgio Vassallo

Antonio Gentile

Filippo Sorbello

Proceedings of the Biological and Artificial Intelligence Environments, 2004

Efficient Rapid Prototyping of Image and Video Processing Algorithms.

[BibT_eX]

[DOI]

Antonio Gentile

Filippo Sorbello

Proceedings of the 2004 Euromicro Symposium on Digital Systems Design (DSD 2004), Architectures, Methods and Tools, 31 August, 2004

2002

MIP: A New Hybrid Multi-Agent Architecture for the Coordination of a Robot Colony Activities.

[BibT_eX]

Antonio Chella

Rosario Sorbello