Masato Akagi

Orcid: 0000-0003-2450-6754

According to our database1, Masato Akagi authored at least 152 papers between 1988 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Music Theory-Inspired Acoustic Representation for Speech Emotion Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Phase-Aware Speech Enhancement With Complex Wiener Filter.
IEEE Access, 2023

Contributions of Jitter and Shimmer in the Voice for Fake Audio Detection.
IEEE Access, 2023

Data-driven Non-uniform Filterbanks Based on F-ratio for Machine Anomalous Sound Detection.
Proceedings of the 31st European Signal Processing Conference, 2023

Increasing Speech Intelligibility by Mimicking Professional Announcers' Voices and Its Physical Correlates.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
Acoustic features correlated to perceived urgency in evacuation announcements.
Speech Commun., 2022

Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion.
Speech Commun., 2022

Speech Emotion and Naturalness Recognitions With Multitask and Single-Task Learnings.
IEEE Access, 2022

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
Proceedings of the Interspeech 2022, 2022

Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement.
Proceedings of the Interspeech 2022, 2022

Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion.
Proceedings of the Interspeech 2022, 2022

Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network.
Proceedings of the 30th European Signal Processing Conference, 2022

2021
$F_0$-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function.
Speech Commun., 2021

Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM.
Speech Commun., 2021

Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech.
Neural Networks, 2021

Acoustic and articulatory analysis and synthesis of shouted vowels.
Comput. Speech Lang., 2021

Cross-Lingual Voice Conversion With Controllable Speaker Individuality Using Variational Autoencoder and Star Generative Adversarial Network.
IEEE Access, 2021

Study on Simultaneous Estimation of Glottal Source and Vocal Tract Parameters by ARMAX-LF Model for Speech Analysis/Synthesis.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Hierarchical Prosody Analysis Improves Categorical and Dimensional Emotion Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Automatic Naturalness Recognition from Acted Speech Using Neural Networks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Simultaneous Estimation of Glottal Source Waveforms and Vocal Tract Shapes from Speech Signals Based on ARX-LF Model.
J. Signal Process. Syst., 2020

Effect of articulatory and acoustic features on the intelligibility of speech in noise: An articulatory synthesis study.
Speech Commun., 2020

Combining F0 and non-negative constraint robust principal component analysis for singing voice separation.
Signal Process., 2020

A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation.
IEICE Trans. Inf. Syst., 2020

Mimicking Lombard Effect: An Analysis and Reconstruction.
IEICE Trans. Inf. Syst., 2020

The Effect of Silence Feature in Dimensional Speech Emotion Recognition.
CoRR, 2020

Speech Emotion Recognition Using 3D Convolutions and Attention-Based Sliding Recurrent Networks With Auditory Front-Ends.
IEEE Access, 2020

Predicting Valence and Arousal by Aggregating Acoustic Features for Acoustic-Linguistic Information Fusion.
Proceedings of the 2020 IEEE Region 10 Conference, 2020

On The Differences Between Song and Speech Emotion Recognition: Effect of Feature Sets, Feature Types, and Classifiers.
Proceedings of the 2020 IEEE Region 10 Conference, 2020

Improving Valence Prediction in Dimensional Speech Emotion Recognition Using Linguistic Information.
Proceedings of the 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2020

Comparison of Glottal Source Parameter Values in Emotional Vowels.
Proceedings of the Interspeech 2020, 2020

Segment-Level Effects of Gender, Nationality and Emotion Information on Text-Independent Speaker Verification.
Proceedings of the Interspeech 2020, 2020

Multitask Learning and Multistage Fusion for Dimensional Audiovisual Emotion Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Enhancement of speech intelligibility under noisy reverberant conditions based on modulation spectrum concept.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Deep Multilayer Perceptrons for Dimensional Speech Emotion Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model.
Speech Commun., 2019

Blind monaural singing voice separation using rank-1 constraint robust principal component analysis and vocal activity detection.
Neurocomputing, 2019

The Contribution of Acoustic Features Analysis to Model Emotion Perceptual Process for Language Diversity.
Proceedings of the Interspeech 2019, 2019

Dimensional Emotion Recognition from Speech Using Modulation Spectral Features and Recurrent Neural Networks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Evaluation of the Lombard effect model on synthesizing Lombard speech in varying noise level environments with limited data.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Speech Emotion Recognition Using Speech Feature and Word Embedding.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Voice conversion for emotional speech: Rule-based synthesis with degree of emotion controllable in dimensional space.
Speech Commun., 2018

Estimation of glottal source waveforms and vocal tract shapes from speech signals based on ARX-LF model.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

A Three-Layer Emotion Perception Model for Valence and Arousal-Based Detection from Multilingual Speech.
Proceedings of the Interspeech 2018, 2018

Auditory-Inspired End-to-End Speech Emotion Recognition Using 3D Convolutional Recurrent Neural Networks Based on Spectral-Temporal Representation.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Unsupervised Singing Voice Separation Based on Robust Principal Component Analysis Exploiting Rank-1 Constraint.
Proceedings of the 26th European Signal Processing Conference, 2018

Estimation of glottal source waveforms and vocal tract shape for singing voices with wide frequency range.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Unsupervised Singing Voice Separation Using Gammatone Auditory Filterbank and Constraint Robust Principal Component Analysis.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Maximal Information Coefficient and Predominant Correlation-Based Feature Selection Toward A Three-Layer Model for Speech Emotion Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
Method of Blindly Estimating Speech Transmission Index in Noisy Reverberant Environments.
J. Inf. Hiding Multim. Signal Process., 2017

Method of Estimating Signal-to-Noise Ratio Based on Optimal Design for Sub-band Voice Activity Detection.
J. Inf. Hiding Multim. Signal Process., 2017

Feature selection method for real-time speech emotion recognition.
Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, 2017

Commonalities of Glottal Sources and Vocal Tract Shapes Among Speakers in Emotional Speech.
Proceedings of the Studies on Speech Production - 11th International Seminar, 2017

Weighted Robust Principal Component Analysis with Gammatone Auditory Filterbank for Singing Voice Separation.
Proceedings of the Neural Information Processing - 24th International Conference, 2017

Study on method for protecting speech privacy by actively controlling speech transmission index in simulated room.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Speech emotion recognition using multichannel parallel convolutional recurrent neural networks based on gammatone auditory filterbank.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments.
J. Signal Process. Syst., 2016

Multilingual Speech Emotion Recognition System Based on a Three-Layer Model.
Proceedings of the Interspeech 2016, 2016

Voice conversion to emotional speech based on three-layered model in dimensional approach and parameterization of dynamic features in prosody.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Optimizing Fuzzy Inference Systems for Improving Speech Emotion Recognition.
Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, 2016

2015
Toward improving estimation accuracy of emotion dimensions in bilingual scenario based on three-layered model.
Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015

Emotional speech synthesis system based on a three-layered model using a dimensional approach.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014
Binaural Sound Source Localization in Noisy Reverberant Environments Based on Equalization-Cancellation Theory.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2014

Toward relaying an affective Speech-to-Speech translator: Cross-language perception of emotional state represented by emotion dimensions.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Toward a Rule-Based Synthesis of Vietnamese Emotional Speech.
Proceedings of the Knowledge and Systems Engineering, 2014

Emotional Speech Recognition and Synthesis in Multiple Languages toward Affective Speech-to-Speech Translation System.
Proceedings of the 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2014

A method for emotional speech synthesis based on the position of emotional state in Valence-Activation space.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Toward affective speech-to-speech translation: Strategy for emotional speech recognition and synthesis in multiple languages.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
Improving Naturalness of HMM-Based TTS Trained with Limited Data by Temporal Decomposition.
IEICE Trans. Inf. Syst., 2013

A hybrid TTS between unit selection and HMM-based TTS under limited data conditions.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Comparative investigation of objective speech intelligibility prediction measures for noise-reduced signals in Mandarin and Japanese.
Proceedings of the INTERSPEECH 2013, 2013

Admissible Range for Individualization of Head-Related Transfer Function in Median Plane.
Proceedings of the Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2013

Blind method of estimating speech transmission index from reverberant speech signals.
Proceedings of the 21st European Signal Processing Conference, 2013

Acoustic sound source tracking for a moving object using precise Doppler-Shift measurement.
Proceedings of the 21st European Signal Processing Conference, 2013

Blind method of estimating speech transmission index in room acoustics based on concept of modulation transfer function.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Objective Japanese intelligibility prediction for noisy speech signals before and after noise-reduction processing.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Improve equalization-cancellation-based sound localization in noisy reverberant environments using direct-to-reverberant energy ratio.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Cross-lingual speech emotion recognition system based on a three-layer model for human perception.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012
Evaluation of objective intelligibility prediction measures for noise-reduced signals in mandarin.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A concatenative speech synthesis for monosyllabic languages with limited data.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Speech emotion recognition system based on a dimensional approach using a three-layered model.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication.
Speech Commun., 2011

Voice Activity Detection in MTF-Based Power Envelope Restoration.
Proceedings of the INTERSPEECH 2011, 2011

2010
A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features.
IEICE Trans. Inf. Syst., 2010

Intelligibility investigation of single-channel noise reduction algorithms for Chinese and Japanese.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

A DOA estimation algorithm based on equalization-cancellation theory.
Proceedings of the INTERSPEECH 2010, 2010

2009
Two-stage binaural speech enhancement with wiener filter based on equalization-cancellation model.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009

Efficient modeling of temporal structure of speech for applications in voice transformation.
Proceedings of the INTERSPEECH 2009, 2009

Psychoacoustically-motivated adaptive beta-order generalized spectral subtraction for cochlear implant patients.
Proceedings of the IEEE International Conference on Acoustics, 2009

MTF-based power envelope restoration in noisy reverberant environments.
Proceedings of the 17th European Signal Processing Conference, 2009

2008
A three-layered model for expressive speech perception.
Speech Commun., 2008

Adaptive beta-order generalized spectral subtraction for speech enhancement.
Signal Process., 2008

A Two-Microphone Noise Reduction Method in Highly Non-stationary Multiple-Noise-Source Environments.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2008

The Improved TS-BASE Approaches with Interference Compensation and Their Evaluations for Speech Enhancement.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Robust front end processing for speech recognition in reverberant environments: utilization of speech characteristics.
Proceedings of the INTERSPEECH 2008, 2008

High-quality analysis/synthesis method based on temporal decomposition for speech modification.
Proceedings of the INTERSPEECH 2008, 2008

Psychoacoustically-motivated adaptive β-order generalized spectral subtraction based on data-driven optimization.
Proceedings of the INTERSPEECH 2008, 2008

2007
Limited error based event localizing temporal decomposition and its application to variable-rate speech coding.
Speech Commun., 2007

Speaker Individualities in Speech Spectral Envelopes and Fundamental Frequency Contours.
Proceedings of the Speaker Classification II, 2007

Method of LP-based blind restoration for improving intelligibility of bone-conducted speech.
Proceedings of the INTERSPEECH 2007, 2007

Vocal conversion from speaking voice to singing voice using STRAIGHT.
Proceedings of the INTERSPEECH 2007, 2007

A flexible spectral modification method based on temporal decomposition and Gaussian mixture model.
Proceedings of the INTERSPEECH 2007, 2007

Noise reduction based on adaptive β-order generalized spectral subtraction for speech enhancement.
Proceedings of the INTERSPEECH 2007, 2007

A rule-based speech morphing for verifying a expressive speech perception model.
Proceedings of the INTERSPEECH 2007, 2007

2006
A noise reduction system based on hybrid noise estimation technique and post-filtering in arbitrary noise environments.
Speech Commun., 2006

Communication Between Speech Production and Perception Within the Brain-Observation and Simulation.
J. Comput. Sci. Technol., 2006

Multi-channel Noise Reduction in Noisy Environments.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

A robust feature extraction based on the MTF concept for speech recognition in reverberant environment.
Proceedings of the INTERSPEECH 2006, 2006

Improved hybrid microphone array post-filter by integrating a robust speech absence probability estimator for speech enhancement.
Proceedings of the INTERSPEECH 2006, 2006

2005
Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis.
Speech Commun., 2005

A model for selective segregation of a target instrument sound from the mixed sound of various instruments.
Proceedings of the INTERSPEECH 2005, 2005

A hybrid microphone array post-filter in a diffuse noise field.
Proceedings of the INTERSPEECH 2005, 2005

A multi-layer fuzzy logical model for emotional speech perception.
Proceedings of the INTERSPEECH 2005, 2005

A noise reduction system in arbitrary noise environments and its applications to speech enhancement and speech recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Toward a Rule-Based Synthesis of Emotional Speech on Linguistic Descriptions of Perception.
Proceedings of the Affective Computing and Intelligent Interaction, 2005

2004
Fundamental Frequency Estimation for Noisy Speech Using Entropy-Weighted Periodic and Harmonic Features.
IEICE Trans. Inf. Syst., 2004

Analysis of acoustic features affecting "singing-ness" and its application to singing-voice synthesis from speaking-voice.
Proceedings of the INTERSPEECH 2004, 2004

Noise reduction using hybrid noise estimation technique and post-filtering.
Proceedings of the INTERSPEECH 2004, 2004

A speech dereverberation method based on the MTF concept using adaptive time-frequency divisions.
Proceedings of the 2004 12th European Signal Processing Conference, 2004

2003
A speech dereverberation method based on the MTF concept.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Efficient quantization of speech excitation parameters using temporal decomposition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

A model for selective segregation of a target instrument sound from the mixed sound of various instruments.
Proceedings of the 2003 International Computer Music Conference, 2003

A method based on the MTF concept for dereverberating the power envelope from the reverberant signal.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Temporal decomposition: a promising approach to VQ-based speaker identification.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Coding speech at very low rates using straight and temporal decomposition.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Improvement of the restricted temporal decomposition method for line spectral frequency parameters.
Proceedings of the IEEE International Conference on Acoustics, 2002

Noise reduction using a small-scale microphone array in multi noise source environment.
Proceedings of the IEEE International Conference on Acoustics, 2002

Limited Error Based Event Localizing Temporal Decomposition.
Proceedings of the 11th European Signal Processing Conference, 2002

2001
Spectral stability based event localizing temporal decomposition.
Comput. Speech Lang., 2001

A fundamental frequency estimation method for noisy speech based on instantaneous amplitude and frequency.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000
Design of robust subtractive beamformer for noisy speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Perception of synthesized singing voices with fine fluctuations in their fundamental frequency contours.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999
A method of signal extraction from noisy signal based on auditory scene analysis.
Speech Commun., 1999

Segregation of vowel in background noise using the model of segregating two acoustic sources based on auditory scene analysis.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

An objective distortion estimator for hearing aids and its application to noise reduction.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1998
Signal extraction from noisy signal based on auditory scene analysis.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Spectral sequence compensation based on continuity of spectral sequence.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Fundamental frequency fluctuation in continuous vowel utterance and its perception.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Spectral stability based event localizing temporal decomposition.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Noise reduction by paired-microphones using spectral subtraction.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997
A method of signal extraction from noisy signal.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Noise reduction by paired microphones.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

1996
Modeling of contextual effects and its application to word spotting.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

1995
Speaker individualities in fundamental frequency contours and its control.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

1994
Speaker individualities in speech spectral envelopes.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Perception of central vowel with pre- and post-anchors.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

1990
Contextual effect models and psycho acoustic evidence for the models.
Proceedings of the First International Conference on Spoken Language Processing, 1990

1988
On the application of spectrum target prediction model to speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 1988


  Loading...