Hideki Kawahara

Kohei Yatabe

Ken-Ichi Sakakibara

CoRR, July, 2025

2024

Interactive tools for making temporally variable, multiple-attributes, and multiple-instances morphing accessible: Flexible manipulation of divergent speech instances for explorational research and education.

[BibT_eX]

[DOI]

CoRR, 2024

Proposal of Protocols for Speech Materials Acquisition and Presentation Assisted By Tools Based on Structured Test Signals.

[BibT_eX]

[DOI]

Proceedings of the 27th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2024

2023

Corrigendum to Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift, Speech Communication 136 (2022) 23-41.

[BibT_eX]

[DOI]

Speech Commun., February, 2023

Acoustic measurement framework for audio systems based on structured periodic test signals.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE Global Conference on Consumer Electronics, 2023

Simultaneous Measurement of Multiple Acoustic Attributes Using Structured Periodic Test Signals Including Music and Other Sound Materials.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift.

[BibT_eX]

[DOI]

Speech Commun., 2022

Measuring pitch extractors' response to frequency-modulated multi-component signals.

[BibT_eX]

[DOI]

CoRR, 2022

Perceptual Evaluation of Penetrating Voices through a Semantic Differential Method.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

An objective test tool for pitch extractors' response attributes.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Safeguarding test signals for acoustic measurement using arbitrary sounds.

[BibT_eX]

[DOI]

Kohei Yatabe

CoRR, 2021

Interactive and Real-Time Acoustic Measurement Tools for Speech Data Acquisition and Presentation: Application of an Extended Member of Time Stretched Pulses.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Mixture of Orthogonal Sequences Made from Extended Time-Stretched Pulses Enables Measurement of Involuntary Voice Fundamental Frequency Response to Pitch Perturbation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cascaded All-Pass Filters with Randomized Center Frequencies and Phase Polarity for Acoustic and Speech Measurement and Data Augmentation.

[BibT_eX]

[DOI]

Kohei Yatabe

Proceedings of the IEEE International Conference on Acoustics, 2021

Implementation of Interactive Tools for Investigating Fundamental Frequency Response of Voiced Sounds to Auditory Stimulation.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Simultaneous measurement of time-invariant linear and nonlinear, and random and extra responses using frequency domain variant of velvet noise.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019

Investigating the Physiological and Acoustic Contrasts Between Choral and Operatic Singing.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Frequency domain variant of Velvet noise and its application to acoustic measurements.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Real-time and interactive tools for vocal training based on an analytic signal with a cosine series envelope.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Frequency domain variants of velvet noise and their application to speech processing and synthesis: with appendices.

[BibT_eX]

[DOI]

CoRR, 2018

Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Revisiting spectral envelope recovery from speech sounds generated by periodic excitation.

[BibT_eX]

[DOI]

Kanru Hua

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and fo estimation.

[BibT_eX]

[DOI]

CoRR, 2017

The Effect of Spectral Tilt on Size Discrimination of Voiced Speech Sounds.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and f<sub>o</sub> Estimation.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Accurate estimation of f0 and aperiodicity based on periodicity detector residuals and deviations of phase derivatives.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis.

[BibT_eX]

[DOI]

Yannis Agiomyrgiannakis

Heiga Zen

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Aliasing-free L-F model and its application to an interactive MATLAB tool and test signal generation for speech analysis procedures.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

TUSK: A Framework for Overviewing the Performance of F0 Estimators.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free L-F Model Component.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

How the slope of the speech spectrum affects the perception of speaker size.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014

Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Proposal for an Interactive 3D Sound Playback Interface Controlled by User behavior.

[BibT_eX]

[DOI]

Proceedings of the HCI International 2014 - Posters' Extended Abstracts, 2014

Development of a Mobile Application for Crowdsourcing the Data Collection of Environmental Sounds.

[BibT_eX]

[DOI]

Proceedings of the Human Interface and the Management of Information. Information and Knowledge Design and Evaluation, 2014

Hearing impairment simulator based on compressive gammachirp filter.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013

Controlling "shout" expression in a Japanese POP singing performance: analysis and suppression study.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Periodicity extraction for voiced sounds with multiple periodicity.

[BibT_eX]

[DOI]

Kenji Ozawa

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Higher order waveform symmetry measure and its application to periodicity detectors for speech and singing with fine temporal resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Temporally variable multi-aspect N-way morphing based on interference-free speech representations.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Vocal tract length estimation for voiced and whispered speech using gammachirp filterbank.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012

Comparison of performance with voiced and whispered speech in word recognition and mean-formant-frequency discrimination.

[BibT_eX]

[DOI]

Speech Commun., 2012

Pitch-Scaled Analysis based Residual Reconstruction for Speech Analysis and Synthesis.

[BibT_eX]

[DOI]

Zhengqi Wen

Jianhua Tao

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Inharmonic speech: a tool for the study of speech perception and separation.

[BibT_eX]

[DOI]

Josh H. McDermott

Daniel P. W. Ellis

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Analysis and synthesis of strong vocal expressions: Extension and application of audio texture features to singing voice.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Detecting child speaker based on auditory feature vectors for VTL estimation.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Modulation transfer function design for a flexible cross synthesis VOCODER based on F0 adaptive spectral envelope recovery.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

An interference-free representation of group delay for periodic signals.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011

Auditory Filterbank Improves Voice Morphing.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

An interference-free representation of instantaneous frequency of periodic signals and its application to F0 extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Development of Web-Based Voice Interface to Identify Child Users Based on Automatic Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the Human-Computer Interaction. Users and Applications, 2011

2010

Exploration of the other aspect of vocoder revisited: A-Z STRAIGHT, TANDEM-STRAIGHT and morphing.

[BibT_eX]

[DOI]

Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

High-quality and light-weight voice transformation enabling extrapolation without perceptual and objective breakdown.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

High quality voice manipulation method based on the vocal tract area function obtained from sub-band LSP of straight spectrum.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Speech morphing based on biologically relevant signal representations.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, 2009

A bottom-up procedure to extract periodicity structure of voiced sounds and its application to represent and restoration of pathological voices.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, 2009

v.morish'09: A Morphing-Based Singing Design Interface for Vocal Melodies.

[BibT_eX]

[DOI]

Proceedings of the Entertainment Computing, 2009

Observation of empirical cumulative distribution of vowel spectral distances and its application to vowel based voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Development of Speech Input Method for Interactive VoiceWeb Systems.

[BibT_eX]

[DOI]

Proceedings of the Human-Computer Interaction. Novel Interaction Methods and Techniques, 2009

2008

Vowel-based frequency alignment function design and recognition-based time alignment for automatic speech morphing.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Speech-to-text input method for web system using JavaScript.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Study on manipulation method of voice quality based on the vocal tract area function.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Spectral envelope recovery beyond the nyquist limit for high-quality manipulation of speech sounds.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Discrimination and recognition of scaled word sounds.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Group delay for acoustic event representation and its application for speech aperiodicity analysis.

[BibT_eX]

[DOI]

Proceedings of the 15th European Signal Processing Conference, 2007

2006

Speech Segregation Using an Auditory Vocoder With Event-Synchronous Enhancements.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2006

Automatic assignment of anchoring points on vowel templates for defining correspondence between time-frequency representations of speech samples.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Analyzing dialogue data for real-world emotional speech classification.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Logarithmic temporal processing applied to accurate empirical transfer function measurements in vocal sound propagation.

[BibT_eX]

[DOI]

Proceedings of the 14th European Signal Processing Conference, 2006

Speech style conversion based on the statistics of vowel spectrograms and nonlinear frequency mapping.

[BibT_eX]

[DOI]

Proceedings of the 14th European Signal Processing Conference, 2006

2005

Voice and emotional expression transformation based on statistics of vowel parameters in an emotional speech database.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Speech intelligibility derived from time-frequency and source smearing.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Underlying Principles of a High-quality Speech Manipulation System STRAIGHT and Its Application to Speech Segregation.

[BibT_eX]

[DOI]

Proceedings of the Speech Separation by Humans and Machines, 2005

Speech Segregation Using an Event-synchronous Auditory Image and STRAIGHT.

[BibT_eX]

[DOI]

Proceedings of the Speech Separation by Humans and Machines, 2005

2004

Acappella synthesis demonstrations using RWC music database.

[BibT_eX]

[DOI]

Hideki Banno

Proceedings of the New Interfaces for Musical Expression, 2004

A design of audio-visual talker tracking system based on CSP analysis and frame difference in real noisy environments.

[BibT_eX]

[DOI]

Proceedings of the IEEE 6th Workshop on Multimedia Signal Processing, 2004

Procedure "senza vibrato": a key component for morphing singing.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Intelligibility of degraded speech from smeared STRAIGHT spectrum.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Algorithm amalgam: morphing waveform based methods, sinusoidal models and STRAIGHT.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Loudspeaker equalization based on multi-location observation with reliable time-frequency region selection and its evaluation using sound propagation measurement.

[BibT_eX]

[DOI]

Proceedings of the 2004 12th European Signal Processing Conference, 2004

2003

Glottal closure instant synchronous sinusoidal model for high quality speech analysis/synthesis.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system.

[BibT_eX]

[DOI]

Hisami Matsui

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Influence of recording equipment on the identification of second language phoneme contrasts.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Speech segregation based on fundamental event information using an auditory vocoder.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Speech enhancement with microphone array and fourier / wavelet spectral subtraction in real noisy environments.

[BibT_eX]

[DOI]

Yuki Denda

Takanobu Nishiura

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation.

[BibT_eX]

[DOI]

Hisami Matsui

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Speech segregation using event synchronous auditory vocoder.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

On F0 trajectory optimization for very high-quality speech manipulation.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Auditory VOCODER: Speech resynthesis from an auditory Mellin representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2002

2001

Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT.

[BibT_eX]

[DOI]

Jo Estill

Osamu Fujimura

Proceedings of the Second International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, 2001

Systematic F0 glitches around nasal-vowel transitions.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Comparative evaluation of F0 estimation algorithms.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000

A sinusoidal model based on frequency-to-instantaneous frequency mapping.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Investigation of analysis and synthesis parameters of straight by subjective evaluation.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Accurate vocal event detection method based on a fixed-point analysis of mapping from time to weighted average group delay.

[BibT_eX]

[DOI]

Yoshinori Atake

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Robust fundamental frequency estimation using instantaneous frequencies of harmonic components.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999

Dynamic sound stream formation based on continuity of spectral change.

[BibT_eX]

[DOI]

Ikuyo Masuda-Katsuse

Speech Commun., 1999

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds.

[BibT_eX]

[DOI]

Ikuyo Masuda-Katsuse

Speech Commun., 1999

Multiple period estimation and pitch perception model.

[BibT_eX]

[DOI]

Speech Commun., 1999

Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity.

[BibT_eX]

[DOI]

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Applying STRAIGHT toward Music Systems - Accurate F0 Estimation and Application for Data-driven Synthesis.

[BibT_eX]

[DOI]

Haruhiro Katayose

Proceedings of the 1999 International Computer Music Conference, 1999

1998

An application of the Bayesian time series model and statistical system analysis for F0 control.

[BibT_eX]

[DOI]

Hiroko Kato

Speech Commun., 1998

An instantaneous-frequency-based pitch extraction method for high-quality speech transformation: revised TEMPO in the STRAIGHT-suite.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Computer-based second language production training by using spectrographic representation and HMM-based speech recognition scores.

[BibT_eX]

[DOI]

Brain Creators: Japanese Initiative to Create Computational Models of Brain Functions.

[BibT_eX]

Yasuji Sawada