Hisashi Kawai

Orcid: 0000-0003-3015-6041

According to our database1, Hisashi Kawai authored at least 189 papers between 1986 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition.
CoRR, 2023

Neural domain alignment for spoken language recognition based on optimal transport.
CoRR, 2023

Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR.
CoRR, 2023

Homeostatic System Design Based on Understanding the Living Environmental Determinants of Falls.
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2023

Generative Linguistic Representation for Spoken Language Identification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

WaveNeXt: ConvNeXt-Based Fast Neural Vocoder Without ISTFT layer.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Cross-Modal Alignment With Optimal Transport For CTC-Based ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Neural speech-rate conversion with multispeaker WaveNet vocoder.
Speech Commun., 2022

Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation.
CoRR, 2022

Partial Coupling of Optimal Transport for Spoken Language Identification.
CoRR, 2022

Pronunciation-Aware Unique Character Encoding for RNN Transducer-Based Mandarin Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Transducer-based language embedding for spoken language identification.
Proceedings of the Interspeech 2022, 2022

2021
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

CrossMap Transformer: A Crossmodal Masked Path Transformer Using Double Back-Translation for Vision-and-Language Navigation.
IEEE Robotics Autom. Lett., 2021

Integrating a joint Bayesian generative model in a discriminative learning framework for speaker verification.
CoRR, 2021

Predicting and attending to damaging collisions for placing everyday objects in photo-realistic simulations.
Adv. Robotics, 2021

Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU.
IEEE Access, 2021

Noise Robust Acoustic Modeling for Single-Channel Speech Recognition Based on a Stream-Wise Transformer Architecture.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders.
Proceedings of the IEEE International Conference on Acoustics, 2021

High-Intelligibility Speech Synthesis for Dysarthric Speakers with LPCNet-Based TTS and CycleVAE-Based VC.
Proceedings of the IEEE International Conference on Acoustics, 2021

Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Multi-Stream HiFi-GAN with Data-Driven Waveform Decomposition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Field Experiment System "VoiceTra".
Proceedings of the Speech-to-Speech Translation, 2020

Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network.
IEEE Robotics Autom. Lett., 2020

A Multimodal Target-Source Classifier With Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects.
IEEE Robotics Autom. Lett., 2020

Compensation on x-vector for Short Utterance Spoken Language Identification.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Joint Training End-to-End Speech Recognition Systems with Speaker Attributes.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.
Proceedings of the Interspeech 2020, 2020

Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020.
Proceedings of the Interspeech 2020, 2020

Transformer-Based Text-to-Speech with Weighted Forced Attention.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Understanding Natural Language Instructions for Fetching Daily Objects Using GAN-Based Multimodal Target-Source Classification.
IEEE Robotics Autom. Lett., 2019

Deep progressive multi-scale attention for acoustic event classification.
CoRR, 2019

Latent-Space Data Augmentation for Visually-Grounded Language Understanding.
Proceedings of the Advances in Artificial Intelligence, 2019

Real-Time Neural Text-to-Speech with Sequence-to-Sequence Acoustic Model and WaveGlow or Single Gaussian WaveRNN Vocoders.
Proceedings of the Interspeech 2019, 2019

Duration Modeling with Global Phoneme-Duration Vectors.
Proceedings of the Interspeech 2019, 2019

Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection.
Proceedings of the Interspeech 2019, 2019

Incorporating Symbolic Sequential Modeling for Speech Enhancement.
Proceedings of the Interspeech 2019, 2019

Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.
Proceedings of the Interspeech 2019, 2019

Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.
Proceedings of the Interspeech 2019, 2019

One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features.
Proceedings of the Interspeech 2019, 2019

End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.
Proceedings of the Interspeech 2019, 2019

Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.
Proceedings of the IEEE International Conference on Acoustics, 2019

Investigations of Real-time Gaussian Fftnet and Parallel Wavenet Neural Vocoders with Simple Acoustic Features.
Proceedings of the IEEE International Conference on Acoustics, 2019

Multimodal Attention Branch Network for Perspective-Free Sentence Generation.
Proceedings of the 3rd Annual Conference on Robot Learning, 2019

HMM-based TTS System Framework.
Proceedings of the IEEE Conference on Computational Intelligence for Financial Engineering & Economics, 2019

Tacotron-Based Acoustic Model Using Phoneme Alignment for Practical Neural Text-to-Speech Systems.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks From Ambiguous Language Instructions.
IEEE Robotics Autom. Lett., 2018

Improving FFTNet Vocoder with Noise Shaping and Subband Approaches.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Improving Very Deep Time-Delay Neural Network With Vertical-Attention For Effectively Training CTC-Based ASR Systems.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification.
Proceedings of the Interspeech 2018, 2018

Multilingual Grapheme-to-Phoneme Conversion with Global Character Vectors.
Proceedings of the Interspeech 2018, 2018

Temporal Attentive Pooling for Acoustic Event Detection.
Proceedings of the Interspeech 2018, 2018

Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks.
Proceedings of the Interspeech 2018, 2018

CTC Loss Function with a Unit-Level Ambiguity Penalty.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Investigation of a Knowledge Distillation Method for CTC Acoustic Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Investigation of Noise Shaping with Perceptual Weighting for Wavenet-Based Speech Generation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Investigation of Subband Wavenet Vocoder Covering Entire Audible Frequency Range with Limited Acoustic Features.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Comparative Evaluations of Various Factored Deep Convolutional Rnn Architectures for Noise Robust Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Development of the "VoiceTra" Multi-Lingual Speech Translation System.
IEICE Trans. Inf. Syst., 2017

Regularization of neural network model with distance metric learning for i-vector based spoken language identification.
Comput. Speech Lang., 2017

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks.
CoRR, 2017

Conditional Generative Adversarial Nets Classifier for Spoken Language Identification.
Proceedings of the Interspeech 2017, 2017

Global Syllable Vectors for Building TTS Front-End with Deep Learning.
Proceedings of the Interspeech 2017, 2017

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Grounded language understanding for manipulation instructions using GAN-based classification.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Subband wavenet with overlapped single-sideband filterbanks.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Incremental training and constructing the very deep convolutional residual network acoustic models.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Raw waveform-based speech enhancement by fully convolutional networks.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription.
Speech Commun., 2016

Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers.
IEICE Trans. Inf. Syst., 2016

Automatic acoustic segmentation in N-best list rescoring for lecture speech recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Comparison of regularization constraints in deep neural network based speaker adaptation.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

A pseudo-task design in multi-task learning deep neural network for speaker recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

F<sub>0</sub> Contour Analysis Based on Empirical Mode Decomposition for DNN Acoustic Modeling in Mandarin Speech Recognition.
Proceedings of the Interspeech 2016, 2016

Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework.
Proceedings of the Interspeech 2016, 2016

Using Zero-Frequency Resonator to Extract Multilingual Intonation Structure.
Proceedings of the Interspeech 2016, 2016

Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification.
Proceedings of the Interspeech 2016, 2016

Maximum a posteriori Based Decoding for CTC Acoustic Models.
Proceedings of the Interspeech 2016, 2016

Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks.
Proceedings of the Interspeech 2016, 2016

Local fisher discriminant analysis for spoken language identification.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Bottleneck linear transformation network adaptation for speaker adaptive training-based hybrid DNN-HMM speech recognizer.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Leveraging social Q&A collections for improving complex question answering.
Comput. Speech Lang., 2015

A cloud robotics approach towards dialogue-oriented robot speech.
Adv. Robotics, 2015

HMM based myanmar text to speech system.
Proceedings of the INTERSPEECH 2015, 2015

Sparse representation with temporal max-smoothing for acoustic event detection.
Proceedings of the INTERSPEECH 2015, 2015

Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

A Myanmar large vocabulary continuous speech recognition system.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014
Non-monologue HMM-based speech synthesis for service robots: A cloud robotics approach.
Proceedings of the 2014 IEEE International Conference on Robotics and Automation, 2014

2013
Investigation of Innervation Zone Shift with Continuous Dynamic Muscle Contraction.
Comput. Math. Methods Medicine, 2013

Multilingual Speech-to-Speech Translation System: VoiceTra.
Proceedings of the 2013 IEEE 14th International Conference on Mobile Data Management, Milan, Italy, June 3-6, 2013, 2013

2012
Distributed speech translation technologies for multiparty multilingual communication.
ACM Trans. Speech Lang. Process., 2012

Experiments on unsupervised statistical parametric speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Resonance-based spectral deformation in HMM-based speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis.
Proceedings of the INTERSPEECH 2012, 2012

2011
Modeling spoken decision support dialogue and optimization of its dialogue strategy.
ACM Trans. Speech Lang. Process., 2011

Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis.
IEICE Trans. Inf. Syst., 2011

Situated Spoken Dialogue with Robots Using Active Learning.
Adv. Robotics, 2011

Toward Construction of Spoken Dialogue System that Evokes Users' Spontaneous Backchannels.
Proceedings of the SIGDIAL 2011 Conference, 2011

Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users' Spontaneous Listener's Reactions.
Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems, 2011

Estimation of Perceptual Spaces for Speaker Identities Based on the Cross-Lingual Discrimination Task.
Proceedings of the INTERSPEECH 2011, 2011

Incorporating Regional Information to Enhance MAP-Based Stochastic Feature Compensation for Robust Speech Recognition.
Proceedings of the INTERSPEECH 2011, 2011

User Study of Spoken Decision Support System.
Proceedings of the INTERSPEECH 2011, 2011

Adaptive Regularization Framework for Robust Voice Activity Detection.
Proceedings of the INTERSPEECH 2011, 2011

Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation.
Proceedings of the INTERSPEECH 2011, 2011

Answering Complex Questions via Exploiting Social Q&A Collection.
Proceedings of the Fifth International Joint Conference on Natural Language Processing, 2011

Improving Related Entity Finding via Incorporating Homepages and Recognizing Fine-grained Entities.
Proceedings of the Fifth International Joint Conference on Natural Language Processing, 2011

A sampling-based environment population projection approach for rapid acoustic model adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Increasing discriminative capability on MAP-based mapping function estimation for acoustic model adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Unsupervised determination of efficient Korean LVCSR units using a Bayesian Dirichlet process model.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
NiCT at TREC 2010: Related Entity Finding.
Proceedings of The Nineteenth Text REtrieval Conference, 2010

An investigation of the impact of speech transcript errors on HMM voices.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Dialogue strategy optimization to assist user's decision for spoken consulting dialogue systems.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Modeling Spoken Decision Making Dialogue and Optimization of its Dialogue Strategy.
Proceedings of the SIGDIAL 2010 Conference, 2010

A Study Toward an Evaluation Method for Spoken Dialogue Systems Considering User Criteria.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Sightseeing Guidance Systems Based on WFST-Based Dialogue Manager.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Construction and Experiment of a Spoken Consulting Dialogue System.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Evaluation of Facial Direction Estimation from Cameras for Multi-modal Spoken Dialog System.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Expansion of WFST-Based Dialog Management for Handling Multiple ASR Hypotheses.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Sequence-Based Pronunciation Modeling Using a Noisy-Channel Approach.
Proceedings of the Spoken Dialogue Systems for Ambient Environments, 2010

Korean pronunciation variation modeling with probabilistic Bayesian networks.
Proceedings of the 4th International Universal Communication Symposium, 2010

Web text classification for response generation in spoken decision support dialogue systems.
Proceedings of the 4th International Universal Communication Symposium, 2010

Improving spontaneous English ASR using a joint-sequence pronunciation model.
Proceedings of the 4th International Universal Communication Symposium, 2010

An environment structuring framework to facilitating suitable prior density estimation for MAPLR on robust speech recognition.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Speech enhancement as a functional approximation and generalization.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Improved training of excitation for HMM-based parametric speech synthesis.
Proceedings of the INTERSPEECH 2010, 2010

Utilizing a noisy-channel approach for Korean LVCSR.
Proceedings of the INTERSPEECH 2010, 2010

An unsupervised approach to creating web audio contents-based HMM voices.
Proceedings of the INTERSPEECH 2010, 2010

Voice activity detection in a reguarized reproducing kernel hilbert space.
Proceedings of the INTERSPEECH 2010, 2010

Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition.
Proceedings of the INTERSPEECH 2010, 2010

Cluster-based language model for spoken document retrieval using NMF-based document clustering.
Proceedings of the INTERSPEECH 2010, 2010

Brazilian portuguese acoustic model training based on data borrowing from other language.
Proceedings of the INTERSPEECH 2010, 2010

Spoken Dialog System on Plasma Display Panel Estimating Users' Interest by Image Processing.
Proceedings of the Workshops Proceedings of the 6th International Conference on Intelligent Environments, 2010

Exploiting Social Q&A Collection in Answering Complex Questions.
Proceedings of the CIPS-SIGHAN Joint Conference on Chinese Language Processing, 2010

Active Learning for Generating Motion and Utterances in Object Manipulation Dialogue Tasks.
Proceedings of the Dialog with Robots, 2010

2009
Hyperbolic structure of fundamental frequency contour.
Proceedings of the 3rd International Universal Communication Symposium, 2009

Robust and Fast Lyric Search based on Phonetic Confusion Matrix.
Proceedings of the 10th International Society for Music Information Retrieval Conference, 2009

A close look into the probabilistic concatenation model for corpus-based speech synthesis.
Proceedings of the INTERSPEECH 2009, 2009

2008
Investigation of Optimum Electrode Locations by Using an Automatized Surface Electromyography Analysis Technique.
IEEE Trans. Biomed. Eng., 2008

Phone duration modeling using gradient tree boosting.
Speech Commun., 2008

Unit database pruning based on the cost degradation criterion for concatenative speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Communicative speech synthesis with XIMERA: a first step.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

A preselection method based on cost degradation from the optimal sequence for concatenative speech synthesis.
Proceedings of the INTERSPEECH 2007, 2007

Reduction of correlation computation in the permutation of the frequency domain ICA by selecting DOAs estimated in subarrays.
Proceedings of the 15th European Signal Processing Conference, 2007

2006
The ATR multilingual speech-to-speech translation system.
IEEE Trans. Speech Audio Process., 2006

An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis.
Speech Commun., 2006

A text-prompted distributed speaker verification system implemented on a cellular phone and a mobile terminal.
Proceedings of the INTERSPEECH 2006, 2006

Quick individual fitting methods of simplified hearing compensation for elderly people.
Proceedings of the INTERSPEECH 2006, 2006

A Short-Latency Unit Selection Method with Redundant Search for Concatenative Speech Synthesis.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Constructing a Phonetic-Rich Speech Corpus While Controlling Time-Dependent Voice Quality Variability for English Speech Synthesis.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

A DOA estimation method for 3D multiple source signals using independent component analysis.
Proceedings of the 14th European Signal Processing Conference, 2006

Evaluation result of transmission control mechanism for multimedia streams based on the multi-RTCP scheme over multiple IP-based networks.
Proceedings of the 3rd IEEE Consumer Communications and Networking Conference, 2006

2005
Discriminative training and explicit duration modeling for HMM-based automatic segmentation.
Speech Commun., 2005

Improvement of rejection performance of keyword spotting using anti-keywords derived from large vocabulary considering acoustical similarity to keywords.
Proceedings of the INTERSPEECH 2005, 2005

Estimation of intonation variation with constrained tone transformations.
Proceedings of the INTERSPEECH 2005, 2005

Analysis of major factors of naturalness degradation in concatenative synthesis.
Proceedings of the INTERSPEECH 2005, 2005

SNR-dependent background noise compensation of PESQ values for cellular phone speech.
Proceedings of the INTERSPEECH 2005, 2005

2004
XIMERA: a new TTS from ATR based on corpus-based technologies.
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

A study on automatic detection of Japanese vowel devoicing for speech synthesis.
Proceedings of the INTERSPEECH 2004, 2004

Using a depth-restricted search to reduce delays in unit selection.
Proceedings of the INTERSPEECH 2004, 2004

Formulating contextual tonal variations in Mandarin.
Proceedings of the INTERSPEECH 2004, 2004

Minimum segmentation error based discriminative training for speech synthesis application.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Optimizing sub-cost functions for segment selection based on perceptual evaluations in concatenative speech synthesis.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Scaling of waveform segments along the time axis for concatenative speech synthesis.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

An evaluation of automatic phone segmentation for concatenative speech synthesis.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Tone pattern discrimination combining parametric modeling and maximum likelihood estimation.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Segment selection considering local degradation of naturalness in concatenative speech synthesis.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Tone feature extraction through parametric modeling and analysis-by-synthesis-based pattern matching.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Feature extraction for unit selection in concatenative speech synthesis: comparison between AIM, LPC, and MFCC.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Design of a Mandarin sentence set for corpus-based speech synthesis by use of a multi-tier algorithm taking account of the varied prosodic and spectral characteristics.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Perceptual evaluation of naturalness due to substitution of Chinese syllable for concatenative speech synthesis.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Acoustic measures vs. phonetic features as predictors of audible discontinuity in concatenative speech synthesis.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Unit selection algorithm for Japanese speech synthesis based on both phoneme unit and diphone unit.
Proceedings of the IEEE International Conference on Acoustics, 2002

2000
A design method of speech corpus for text-to-speech synthesis taking account of prosody.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1998
Recognition of connected digit speech in Japanese collected over the telephone network.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

1994
Development of a text-to-speech system for Japanese based on waveform splicing.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

1990
The linguistic processing module for Japanese text-to-speech system.
Proceedings of the First International Conference on Spoken Language Processing, 1990

Improvement of the synthetic speech quality of the formant-type speech synthesizer and its subjective evaluation.
Proceedings of the First International Conference on Spoken Language Processing, 1990

A system for synthesizing Japanese speech from orthographic text.
Proceedings of the 1990 International Conference on Acoustics, 1990

1988
Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese.
Proceedings of the IEEE International Conference on Acoustics, 1988

1986
Generation of prosodic symbols for rule-synthesis of connected speech of Japanese.
Proceedings of the IEEE International Conference on Acoustics, 1986


  Loading...