Nobuaki Minematsu

CoRR, January, 2026

Gestural feature extraction and multi-feature co-activation for dysarthric speech recognition.

[BibT_eX]

[DOI]

Inf. Fusion, 2026

Incorporating Respect into LLM-Based Academic Feedback: A BI-R Framework for Instructing Students after Q&A Sessions.

[BibT_eX]

[DOI]

Mayuko Aiba

Proceedings of the 16th International Workshop on Spoken Dialogue System Technology, 2026

2025

Re:Member: Emotional Question Generation from Personal Memories.

[BibT_eX]

[DOI]

Zackary Rackauckas

Julia Hirschberg

CoRR, October, 2025

Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

A Perception-Based L2 Speech Intelligibility Indicator: Leveraging a Rater's Shadowing and Sequence-to-sequence Voice Conversion.

[BibT_eX]

[DOI]

Haopeng Geng

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Bandwidth Extension System for Throat Microphone Speech Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, ICME 2025 - Workshops, Nantes, France, June 30, 2025

LangInLab: Augmenting Engineering Lab Instruction with Vision- and Voice-Enabled AI Agents for Language Learning.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Human-Agent Interaction, 2025

Benchmarking Prosody Encoding in Discrete Speech Tokens.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

MixedG2P-T5: G2P-Free Speech Synthesis for Mixed-Script Texts Using Speech Self-Supervised Learning and Language Model.

[BibT_eX]

[DOI]

Joonyong Park

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

2024

Simulating Native Speaker Shadowing for Nonnative Speech Assessment with Latent Speech Representations.

[BibT_eX]

[DOI]

Haopeng Geng

CoRR, 2024

Analysis and Visualization of Directional Diversity in Listening Fluency of World Englishes Speakers in the Framework of Mutual Shadowing.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Acceleration of Posteriorgram-based DTW by Distilling the Class-to-class Distances Encoded in the Classifier Used to Calculate Posteriors.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Pilot Study of GSLM-based Simulation of Foreign Accentuation Only Using Native Speech Corpora.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Exploring Pre-trained Speech Model for Articulatory Feature Extraction in Dysarthric Speech Using ASR.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A ChatGPT-based oral Q&A practice system for first-time student participants in international conferences.

[BibT_eX]

[DOI]

Mayuko Aiba

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model.

[BibT_eX]

[DOI]

Joonyong Park

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

Enhancing Acoustic Scene Classification with Layer-wise Fine-Tuning on the SSAST Model.

[BibT_eX]

[DOI]

Shuting Hao

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings.

[BibT_eX]

[DOI]

Haopeng Geng

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023

Sensitivity to Phonemic Contrasts and Insensitivity to Non-phonemic Contrasts of Various Speech Representations Tested for L2 Speech Assessment.

[BibT_eX]

[DOI]

Proceedings of the 9th Workshop on Speech and Language Technology in Education, 2023

Density and Entropy of Spoken Syllables in American English and Japanese English Estimated with Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 9th Workshop on Speech and Language Technology in Education, 2023

Learners' Prosodic Control in the Task of Expressive Storytelling and Predicted Native Listeners' Impressions of the Learners' Speech.

[BibT_eX]

[DOI]

Proceedings of the 9th Workshop on Speech and Language Technology in Education, 2023

A Unified Framework to Improve Learners' Skills of Perception and Production Based on Speech Shadowing and Overlapping.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Automatic Prediction of Language Learners' Listenability Using Speech and Text Features Extracted from Listening Drills.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Multiple Acoustic Features Speech Emotion Recognition Using Cross-Attention Transformer.

[BibT_eX]

[DOI]

Yurun He

Proceedings of the IEEE International Conference on Acoustics, 2023

Exploring Isolated Musical Notes as Pre-training Data for Predominant Instrument Recognition in Polyphonic Music.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Automatic Prediction of Intelligibility of Words and Phonemes Produced Orally by Japanese Learners of English.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Detection of Learners' Listening Breakdown with Oral Dictation and Its Use to Model Listening Skill Improvement Exclusively Through Shadowing.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Gradual Improvements Observed in Learners' Perception and Production of L2 Sounds Through Continuing Shadowing Practices on a Daily Basis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Text-to-speech synthesis using spectral modeling based on non-negative autoencoder.

[BibT_eX]

[DOI]

Takeru Gorai

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Quantifying Discriminability between NMF Bases.

[BibT_eX]

[DOI]

Eisuke Konno

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Optimized Prediction of Fluency of L2 English Based on Interpretable Network Using Quantity of Phonation and Quality of Pronunciation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Lexical Density Analysis of Word Productions in Japanese English Using Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Shintaro Ando

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multi-Granularity Annotation of Instantaneous Intelligibility of Learners' Utterances Based on Shadowing Techniques.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Acoustic Simulation of Body-conducted Speech and Its Use to Convert One's Recorded Voices to One's Own Voices.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Tensor Factor Analysis for Arbitrary Speaker Conversion.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2020

Discriminative Method to Extract Coarse Prosodic Structure and its Application for Statistical Phrase/Accent Command Estimation.

[BibT_eX]

[DOI]

Yuma Shirahata

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Shadowability Annotation with Fine Granularity on L2 Utterances and its Improvement with Native Listeners' Script-Shadowing.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Converting Written Language to Spoken Language with Neural Machine Translation for Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Many-to-Many and Completely Parallel-Data-Free Voice Conversion Based on Eigenspace DNN.

[BibT_eX]

[DOI]

Tetsuya Hashimoto

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Voice Conversion without Explicit Separation of Source and Filter Components Based on Non-negative Matrix Factorization.

[BibT_eX]

[DOI]

Hitoshi Suda

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Generative Modeling of F0 Contours Leveraged by Phrase Structure and Its Application to Statistical Focus Control.

[BibT_eX]

[DOI]

Yuma Shirahata

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Native Listeners' Shadowing of Non-native Utterances as Spoken Annotation Representing Comprehensibility of the Utterances.

[BibT_eX]

[DOI]

Proceedings of the 8th ISCA International Workshop on Speech and Language Technology in Education, 2019

Does Speaking Training Application with Speech Recognition Motivate Junior High School Students in Actual Classroom? - A Case Study.

[BibT_eX]

[DOI]

Proceedings of the 8th ISCA International Workshop on Speech and Language Technology in Education, 2019

Prototyping a web-based phonetic training game to improve /r/-/l/ identification by Japanese learners of English.

[BibT_eX]

[DOI]

Adriana Guevara-Rukoz

Alexander Martin

Yutaka Yamauchi

Proceedings of the 8th ISCA International Workshop on Speech and Language Technology in Education, 2019

A Large Collection of Sentences Read Aloud by Vietnamese Learners of Japanese and Native Speaker's Reverse Shadowings.

[BibT_eX]

[DOI]

Proceedings of the 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2019

Analysis of Native Listeners' Facial Microexpressions While Shadowing Non-Native Speech - Potential of Shadowers' Facial Expressions for Comprehensibility Prediction.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cooking State Recognition based on Acoustic Event Detection.

[BibT_eX]

[DOI]

Yusaku Korematsu

Proceedings of the 11th Workshop on Multimedia for Cooking and Eating Activities, 2019

The UTokyo speech synthesis system for Blizzard Challenge 2019.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

Speech representation based on tensor factor analysis and its application to speaker recognition and language identification.

[BibT_eX]

[DOI]

So Suzuki

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Experimental investigation on the efficacy of Affine-DTW in the quality of voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

DNN-based Statistical Parametric Speech Synthesis Incorporating Non-negative Matrix Factorization.

[BibT_eX]

[DOI]

Shunsuke Goto

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder.

[BibT_eX]

[DOI]

IEEE Access, 2018

DNN-Based Scoring of Language Learners' Proficiency Using Learners' Shadowings and Native Listeners' Responsive Shadowings.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

A Comparative Study of Statistical Conversion of Face to Voice Based on Their Subjective Impressions.

[BibT_eX]

[DOI]

Yasuhito Ohsugi

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Study of Objective Measurement of Comprehensibility through Native Speakers' Shadowing of Learners' Utterances.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Accent Sandhi Estimation of Tokyo Dialect of Japanese Using Conditional Random Fields.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2017

Development and Evaluation of Online Infrastructure to Aid Teaching and Learning of Japanese Prosody.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2017

Development and Maintenance of Practical and In-service Systems for Recording Shadowing Utterances and Their Assessment.

[BibT_eX]

[DOI]

Proceedings of the 7th ISCA International Workshop on Speech and Language Technology in Education, 2017

Investigation of teacher-selected sentences and machine-suggested sentences in terms of correlation between human ratings and GOP-based machine scores.

[BibT_eX]

[DOI]

Proceedings of the 7th ISCA International Workshop on Speech and Language Technology in Education, 2017

New Features and Effectiveness of Suzuki-kun, the First and Only Prosodic Reading Tutor of Tokyo Japanese.

[BibT_eX]

[DOI]

Proceedings of the 7th ISCA International Workshop on Speech and Language Technology in Education, 2017

Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Acoustic-to-Articulatory Mapping Based on Mixture of Probabilistic Canonical Correlation Analysis.

[BibT_eX]

[DOI]

Hidetsugu Uchida

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition.

[BibT_eX]

[DOI]

Shohei Toyama

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

The UTokyo speech synthesis system for Blizzard Challenge 2017.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017

Voice conversion based on deep neural networks for time-variant linear transformations.

[BibT_eX]

[DOI]

Gaku Kotani

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Phonetisaurus: Exploring grapheme-to-phoneme conversion with joint n-gram models in the WFST framework.

[BibT_eX]

[DOI]

Nat. Lang. Eng., 2016

Prosodic Reading Tutor of Japanese, Suzuki-kun: The first and only educational tool to teach the formal Japanese.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Improved prediction of the accent gap between speakers of English for individual-based clustering of World Englishes.

[BibT_eX]

[DOI]

Fumiya Shiozawa

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Acoustic correlates and gender effects in production and perception of Japanese polite speech.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-Based Speech Synthesis.

[BibT_eX]

[DOI]

Yi Zhao

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Prediction of the Articulatory Movements of Unseen Phonemes of a Speaker Using the Speech Structure of Another Speaker.

[BibT_eX]

[DOI]

Hidetsugu Uchida

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Automatic Assessment and Error Detection of Shadowing Speech: Case of English Spoken by Japanese Learners.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Divergence estimation based on deep neural networks and its use for language identification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

The UTokyo System for Blizzard Challenge 2016.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

Arbitrary speaker conversion based on speaker space bases constructed by deep neural networks.

[BibT_eX]

[DOI]

Tetsuya Hashimoto

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015

Discriminative re-ranking for automatic speech recognition by leveraging invariant structures.

[BibT_eX]

[DOI]

Speech Commun., 2015

Automatic recognition of Japanese vowel length accounting for speaking rate and motivated by perception analysis.

[BibT_eX]

[DOI]

Speech Commun., 2015

Automatic prediction of intelligibility of English words spoken with Japanese accents - comparative study of features and models used for prediction.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Development of a prosodic reading tutor of Japanese - effective use of TTS and F0 contour modeling techniques for CALL.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Noise-robust and stress-free visualization of pronunciation diversity of World Englishes using a learner's self-centered viewpoint.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015

Statistical acoustic-to-articulatory mapping unified with speaker normalization based on voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A measure of phonetic similarity to quantify pronunciation variation by using ASR technology.

[BibT_eX]

[DOI]

Tianze Shi

Shun Kasahara

Proceedings of the 18th International Congress of Phonetic Sciences, 2015

2014

Speaker-basis Accent Clustering Using Invariant Structure Analysis and the Speech Accent Archive.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Keynote 2: Perceptual and structural analysis of pronunciation diversity of World Englishes.

[BibT_eX]

[DOI]

Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Visualization of pronunciation diversity of world Englishes from a speaker's self-centered viewpoint.

[BibT_eX]

[DOI]

Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Application of matrix variate Gaussian mixture model to statistical voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Semi-supervised noise dictionary adaptation for exemplar-based noise robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Improved and robust prediction of pronunciation distance for individual-basis clustering of World Englishes pronunciation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Leveraging phonetic context dependent invariant structure for continuous speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

2013

Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Japanese lexical accent recognition for a CALL system by deriving classification equations with perceptual experiments.

[BibT_eX]

[DOI]

Speech Commun., 2013

Unsupervised optimal phoneme segmentation: theory and experimental evaluation.

[BibT_eX]

[DOI]

Dean Luo

IET Signal Process., 2013

Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese.

[BibT_eX]

[DOI]

Hiroya Hashimoto

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Automatic recognition of vowel length in Japanese for a CALL system motivated by perceptual experiments.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013

Speaker-based accented English clustering using a world English archive.

[BibT_eX]

[DOI]

Chung-Hsien Wu

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013

Automatic detection of the words that will become unintelligible through Japanese accented pronunciation of English.

[BibT_eX]

[DOI]

Takehiko Makino

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013

OJAD: a free online accent and intonation dictionary for teachers and learners of Japanese.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013

Failure transitions for joint n-gram models and G2p conversion.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Development of a web framework for teaching and learning Japanese prosody: OJAD (online Japanese accent dictionary).

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model.

[BibT_eX]

[DOI]

Oraphan Krityakien

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A free online accent and intonation dictionary for teachers and learners of Japanese.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Artificial bandwidth extension based on regularized piecewise linear mapping with discriminative region weighting and long-Span features.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improved estimation of femininity using GMM supervectors and SVR for voice therapy of Gender Identity Disorder Clients.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Automatic pronunciation clustering using a World English archive and pronunciation structure analysis.

[BibT_eX]

[DOI]

Chung-Hsien Wu

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Discriminative piecewise linear transformation based on deep learning for noise robust automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Speaker-invariant and rhythm-sensitive representation of spoken words.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012

Statistical Voice Conversion Based on Noisy Channel Model.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

A method for generation of Mandarin F0 contours based on tone nucleus model and superpositional model.

[BibT_eX]

[DOI]

Qinghua Sun

Speech Commun., 2012

Automatic Chinese pronunciation error detection using SVM trained with structural features.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Performance improvement of automatic pronunciation assessment in a noisy classroom.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Audio-visual feature integration based on piecewise linear transformation for noise robust automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Discriminative Reranking for LVCSR Leveraging Invariant Structure.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Effects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Dynamic Grammars with Lookahead Composition for WFST-based Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Improved Prediction of Japanese Word Accent Sandhi Using CRF.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis.

[BibT_eX]

[DOI]

Hiroya Hashimoto

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Unseen noise robust speech recognition using adaptive piecewise linear transformation.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

WFST-Based Grapheme-to-Phoneme Conversion: Open Source tools for Alignment, Model-Building and Decoding.

[BibT_eX]

[DOI]

Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing, 2012

2011

Regularized Maximum Likelihood Linear Regression Adaptation for Computer-Assisted Language Learning Systems.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2011

Rule-based method for pitch level classification for a Japanese pitch accent CALL system.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2011

Comparison of native and non-native evaluations of the naturalness of Japanesewords with prosody modified through voice morphing.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2011

Representing fundamental frequency contours generated by HMM-based speech synthesis using generation process model.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Continuous Digits Recognition Leveraging Invariant Structure.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Study on Bag of Gaussian Model with Application to Voice Conversion.

[BibT_eX]

[DOI]

Tong Tong

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Painless WFST Cascade Construction for LVCSR - Transducersaurus.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Measurement of Objective Intelligibility of Japanese Accented English Using ERJ (English Read by Japanese) Database.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Gesture Design of Hand-to-Speech Converter Derived from Speech-to-Hand Converter Based on Probabilistic Integration Model.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Adaptation of Prosody in Speech Synthesis by Changing Command Values of the Generation Process Model of Fundamental Frequency.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

High accurate model-integration-based voice conversion using dynamic features and model structure optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Improved F0 modeling and generation in voice conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Open Source WFST Tools for LVCSR Cascade Development.

[BibT_eX]

[DOI]

Proceedings of the Finite-State Methods and Natural Language Processing, 2011

Decision of response timing for incremental speech recognition with reinforcement learning.

[BibT_eX]

[DOI]

Di Lu

Takuya Nishimoto

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Structure-constrained distribution matching using quadratic programming and its application to pronunciation evaluation.

[BibT_eX]

[DOI]

Proceedings of the First Asian Conference on Pattern Recognition, 2011

2010

A study on invariance of f-divergence and its application to speech recognition.

[BibT_eX]

[DOI]

IEEE Trans. Signal Process., 2010

Speech Structure and Its Application to Robust Speech Processing.

[BibT_eX]

[DOI]

New Gener. Comput., 2010

Improved generation of prosodic features in HMM-based Mandarin speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

A method for modeling and generating Mandarin tone contour with phrase intonation based on the generation process model.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Human speech model based on information separation and its application to speech processing.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Dialect-based speaker classification using speaker-invariant dialect features.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Improving Mandarin segmental duration prediction with automatically extracted syntax features.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Improved generation of fundamental frequency in HMM-based speech synthesis using generation process model.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Integration of multilayer regression analysis with structure-based pronunciation assessment.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Probabilistic integration of joint density model and speaker model for voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Regularized-MLLR speaker adaptation for computer-assisted language learning system.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

HMM-based sequence-to-frame mapping for voice conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Pitch Pattern Recognition of Isolated Words for the Development of a Japanese Language Call System.

[BibT_eX]

Proceedings of the Electronic Speech Signal Processing, 2010

Human Speech Model based on Information Separation.

[BibT_eX]

Proceedings of the Electronic Speech Signal Processing, 2010

Using FO Contour Generation Process Model for Improved and Flexible Control of Prosodie Features in HMM-based Speech Synthesis.

[BibT_eX]

Proceedings of the Electronic Speech Signal Processing, 2010

2009

A Theory of Phase Singularities for Image Representation and its Applications to Object Tracking and Image Matching.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2009

Improved structure-based automatic estimation of pronunciation proficiency.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2009

Structure-based pronunciation assessment.

[BibT_eX]

[DOI]

Masayuki Suzuki

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2009

Analysis and comparison of automatic language proficiency assessment between shadowed sentences and read sentences.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2009

Development of a CALL system to enhance ESL/EFL learners' skills of shadowing and reading aloud.

[BibT_eX]

[DOI]

Dean Luo

Antonio Rui Ferreira Rebordão

Yutaka Yamauchi

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2009

Optimal event search using a structural cost function - improvement of structure to speech conversion.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

How to improve TTS systems for emotional expressivity.

[BibT_eX]

[DOI]

Shaikh Mostafa Al Masum

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

On invariant structural representation for speech recognition: theoretical validation and experimental improvement.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Structural analysis of dialects, sub-dialects and sub-sub-dialects of Chinese.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Analysis and utilization of MLLR speaker adaptation technique for learners' pronunciation evaluation.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Speech generation from hand gestures based on space mapping.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Affine invariant features and their application to speech recognition.

[BibT_eX]

[DOI]

Masayuki Suzuki

Proceedings of the IEEE International Conference on Acoustics, 2009

Mixture of Probabilistic Linear Regressions: A unified view of GMM-based mapping techiques.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Control of prosodic focus in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model.

[BibT_eX]

[DOI]

Keiko Ochi

Proceedings of the IEEE International Conference on Acoustics, 2009

Sub-structure-based estimation of pronunciation proficiency and classification of learners.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

A study on Hidden Structural Model and its application to labeling sequences.

[BibT_eX]

[DOI]

Masayuki Suzuki

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners.

[BibT_eX]

[DOI]

Speech Commun., 2008

Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2008

Speaker Verification in Realistic Noisy Environment in Forensic Science.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2008

Corpus-based synthesis of Mandarin speech with F0 contours generated by superposing tone components on rule-generated phrase components.

[BibT_eX]

[DOI]

Qinghua Sun

Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Automatic Assessment of Language Proficiency through Shadowing.

[BibT_eX]

[DOI]

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Decomposition of rotational distortion caused by VTL difference using eigenvalues of its transformation matrix.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Structure to speech conversion - speech generation based on infant-like vocal imitation.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

f-divergence is a generalized invariant measure between distributions.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Metric learning for unsupervised phoneme segmentation.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Control of prosodic focus in corpus-based generation of fundamental frequency based on the generation process model.

[BibT_eX]

[DOI]

Keiko Ochi

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Robust voiced/unvoiced speech classification using empirical mode decomposition and periodic correlation model.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Automatic pronunciation evaluation of language learners' utterances generated through shadowing.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Directional dependency of cepstrum on vocal tract length.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Phase singularities for image representation and matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons.

[BibT_eX]

[DOI]

Naoya Shimomura

Proceedings of the IEEE International Conference on Acoustics, 2008

Multi-stream parameterization for structural speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Two-step generation of Mandarin F0 contours based on tone nucleus and superpositional models.

[BibT_eX]

[DOI]

Qinghua Sun

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

CRF-based statistical learning of Japanese accent sandhi for developing Japanese text-to-speech synthesis systems.

[BibT_eX]

[DOI]

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Development of a Femininity Estimator for Voice Therapy of Gender Identity Disorder Clients.

[BibT_eX]

[DOI]

Kyoko Sakuraba

Proceedings of the Speaker Classification II, 2007

Structural representation of pronunciation and its application for classifying Japanese learners of English.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Speech and Language Technology in Education, 2007

Are learners myna birds to the averaged distributions of native speakers? - a note ofwarning from a serious speech engineer -.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Speech and Language Technology in Education, 2007

Consideration of Infants' Vocal Imitation Through Modeling Speech as Timbre-Based Melody.

[BibT_eX]

[DOI]

Tazuko Nishimura

Proceedings of the New Frontiers in Artificial Intelligence, 2007

Features of pauses and conjunctions at syntactic and discourse boundaries in Japanese monologues.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

A framework of reply speech generation for concept-to-speech conversion in spoken dialogue systems.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Pitch estimation of noisy speech signals using empirical mode decomposition.

[BibT_eX]

[DOI]

Md. Kamrul Hasan

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Structural assessment of language learners' pronunciation.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Corpus-based generation of prosodic features from text based on generation process model.

[BibT_eX]

[DOI]

Keiko Ochi

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

F0 models show Chinese speakers of Japanese insert intonational boundaries and drop pitch.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

EMD based soft-thresholding for speech enhancement.

[BibT_eX]

[DOI]

Erhan Deger

Md. Kamrul Hasan

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Automatic recognition of connected vowels only using speaker-invariant representation of speech dynamics.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Development of a Femininity Estimator using Speaker Recognition Techniques for Voice Therapy of Gender Identity Disorder Clients.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Speech enhancement using soft thresholding with DCT-EMD based hybrid algorithm.

[BibT_eX]

[DOI]

Erhan Deger

Md. Kamrul Hasan

Proceedings of the 15th European Signal Processing Conference, 2007

Random discriminant structure analysis for automatic recognition of connected vowels.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

Separation of Mixed Audio Signals by Decomposing Hilbert Spectrum with Modified EMD.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2006

Structural Representation of the pronunciation and its Use for Call.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

Localization based audio source separation by sub-band beamforming.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

Factors affecting speakers² choice of fillers in Japanese presentations.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Development of a program for self assessment of Japanese pronunciation by English learners.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Tone recognition of continuous speech of standard Chinese using neural network and tone nucleus model.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses.

[BibT_eX]

[DOI]

Yasufumi Asano

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Unfilled pauses in Japanese sentences read aloud by non-native learners.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Localization Based Separation of Mixed Audio Signals with Binary Masking of Hilbert Spectrum.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Para-Linguistic Information Represented as Distortion of the Acoustic Universal Structure In Speech.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Separation of Mixed Audio Signals by Source Localization and Binary Masking with Hilbert Spectrum.

[BibT_eX]

[DOI]

Proceedings of the Independent Component Analysis and Blind Signal Separation, 2006

Factors influencing ratios of filled pauses at clause boundaries in Japanese.

[BibT_eX]

[DOI]

Proceedings of the ISCA Tutorial and Research Workshop on Experimental Linguistics, 2006

2005

Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora: application to emotional speech synthesis.

[BibT_eX]

[DOI]

Speech Commun., 2005

Audio source separation by source localization with Hilbert spectrum.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

Filled pauses as cues to the complexity of following phrases.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Generation of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Japanese vowel recognition based on structural representation of speech.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Multi-band approach of audio source discrimination with empirical mode decomposition.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Corpus-based extraction of F0 contour generation process model parameters.

[BibT_eX]

[DOI]

Yusuke Furuyama

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Structural representation of the non-native pronunciations.

[BibT_eX]

[DOI]

Toshiko Isei-Jaakkola

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Mathematical Evidence of the Acoustic Universal Structure in Speech.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

The effects of filled pauses on native and non-native listeners2 speech processing.

[BibT_eX]

[DOI]

Proceedings of the ISCA Tutorial and Research Workshop (ITRW) on Disfluency in Spontaneous Speech, 2005

Improved concept-to-speech generation in a dialogue system on road guidance.

[BibT_eX]

[DOI]

Proceedings of the 4th International Conference on Cyberworlds (CW 2005), 2005

2004

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents.

[BibT_eX]

Proceedings of the Life-like characters - tools, affective functions, and applications., 2004

A spoken dialogue system for document information retrieval utilizing topic knowledge.

[BibT_eX]

[DOI]

Shinya Kiriyama

Syst. Comput. Jpn., 2004

Prosodic Analysis and Modeling of Nagauta Singing to Generate Prosodic Contours from Standard Scores.

[BibT_eX]

[DOI]

Bungo Matsuoka

IEICE Trans. Inf. Syst., 2004

Corpus-based synthesis of fundamental frequency contours with various speaking styles from text using F0 contour generation process model.

[BibT_eX]

[DOI]

Kentaro Sato

Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Clause types and filed pauses in Japanese spontaneous monologues.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Audio source separation from the mixture using empirical mode decomposition with independent subspace analysis.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Pronunciation assessment based upon the phonological distortions observed in language learners' utterances.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Pronunciation assessment based upon the compatibility between a learner's pronunciation structure and the target language's lexical structure.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Use of prosodic features for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

N-gram language modeling of Japanese using bunsetsu boundaries.

[BibT_eX]

[DOI]

Sungyup Chung

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Yet another acoustic representation of speech sounds.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Data-driven generation of F0 contours using a superpositional model.

[BibT_eX]

[DOI]

Atsuhiro Sakurai

Speech Commun., 2003

Mora F0 representation for accent type identification in continuous speech and considerations on its relation with perceived pitch values.

[BibT_eX]

[DOI]

Carlos Toshinori Ishi

Speech Commun., 2003

Estimation of resonant characteristics based on AR-HMM modeling and spectral envelope conversion of vowel sounds.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Considerations on vowel durations for Japanese CALL system.

[BibT_eX]

[DOI]

Taro Mouri

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Automatic estimation of perceptual age using speaker modeling techniques.

[BibT_eX]

[DOI]

Keita Yamauchi

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Improvement of non-native speech recognition by effectively modeling frequently observed pronunciation habits.

[BibT_eX]

[DOI]

Koichi Osaki

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Prosodic analysis and modeling of the NAGAUTA singing to synthesize its prosodic patterns from the standard notation.

[BibT_eX]

[DOI]

Bungo Matsuoka

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

CART-based factor analysis of intelligibility reduction in Japanese English.

[BibT_eX]

[DOI]

Changchen Guo

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Speech generation from concept for realizing conversation with an agent in a virtual room.

[BibT_eX]

[DOI]

Junji Tago

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Corpus-based synthesis of fundamental frequency contours of Japanese using automatically-generated prosodic corpus and generation process model.

[BibT_eX]

[DOI]

Takayuki Ono

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

A pronunciation training system for Japanese lexical accents with corrective feedback in learner's voice.

[BibT_eX]

[DOI]

Frédéric Gendrin

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Use of linguistic information for automatic extraction of f_0 contour generation process model parameters.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002

English Speech Database Read by Japanese Learners for CALL System Development.

[BibT_eX]

[DOI]

Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

Separation of voiced source characteristics and vocal tract transfer function characteristics for speech sounds by iterative analysis based on AR-HMM model.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Automatic extraction of model parameters from fundamental frequency contours of English utterances.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Acoustic modeling of sentence stress using differential features between syllables for English rhythm learning system development.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Corpus-based analysis of English spoken by Japanese students in view of the entire phonemic system of English.

[BibT_eX]

[DOI]

Gakuto Kurata

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Integration of MLLR adaptation with pronunciation proficiency adaptation for non-native speech recognition.

[BibT_eX]

[DOI]

Gakuto Kurata

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Robust speech recognition using inter-speaker and intra-speaker adaptation.

[BibT_eX]

[DOI]

Baojie Li

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Statistical language modeling with prosodic boundaries and its use for continuous speech recognition.

[BibT_eX]

[DOI]

Makoto Terao

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Improved corpus-based synthesis of fundamental frequency contours using generation process model.

[BibT_eX]

[DOI]

Masaya Eto

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

A method for automatic extraction of model parameters from fundamental frequency contours of speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2002

Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers.

[BibT_eX]

[DOI]

Mariko Sekiguchi

Proceedings of the IEEE International Conference on Acoustics, 2002

2001

Instantaneous estimation of accentuation habits for Japanese students to learn English pronunciation.

[BibT_eX]

[DOI]

Naoki Nakamura

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Use of topic knowledge in spoken dialogue information retrieval system for academic documents.

[BibT_eX]

[DOI]

Shinya Kiriyama

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Identification of accent and intonation in sentences for CALL systems.

[BibT_eX]

[DOI]

Carlos Toshinori Ishi

Ryuji Nishide

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Corpus-based synthesis of fundamental frequency contours based on a generation process model.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Generation of F0 contours using a model-constrained data-driven method.

[BibT_eX]

[DOI]

Atsuhiro Sakurai

Proceedings of the IEEE International Conference on Acoustics, 2001

2000

IPA Japanese Dictation Free Software Project.

[BibT_eX]

[DOI]

Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

Data-driven intonation modeling using a neural network and a command response model.

[BibT_eX]

[DOI]

Atsuhiro Sakurai

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Development of a formant-based analysis-synthesis system and generation of high quality liquid sounds of Japanese.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Instantaneous estimation of prosodic pronunciation habits for Japanese students to learn English pronunciation.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Quality improvement of PSOLA analysis-synthesis using partial zero-phase conversion.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Performance comparison among HMM, DTW, and human abilities in terms of identifying stress patterns of word utterances.

[BibT_eX]

[DOI]

Yukiko Fujisawa

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Modeling phone correlation for speaker adaptive speech recognition.

[BibT_eX]

[DOI]

Baojie Li

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Efficient search strategy in large vocabulary continuous speech recognition using prosodic boundary information.

[BibT_eX]

[DOI]

Shi-wook Lee

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Free software toolkit for Japanese large vocabulary continuous speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Identification of Japanese double-mora phonemes considering speaking rate for the use in CALL systems.

[BibT_eX]

[DOI]

Carlos Toshinori Ishi

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Analytical and perceptual study on the role of acoustic features in realizing emotional speech.

[BibT_eX]

[DOI]

Hiromichi Kawanami

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1998

Modeling of variations in cepstral coefficients caused by F0 changes and its application to speech processing.

[BibT_eX]

[DOI]