Yasunori Ohishi

Orcid: 0000-0002-7856-248X

According to our database¹, Yasunori Ohishi authored at least 52 papers between 2005 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2026

2025

Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes.

[BibT_eX]

[DOI]

CoRR, June, 2025

Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis.

[BibT_eX]

[DOI]

CoRR, April, 2025

M2D-CLAP: Exploring General-Purpose Audio-Language Representations Beyond CLAP.

[BibT_eX]

[DOI]

IEEE Access, 2025

CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Towards Pre-training an Effective Respiratory Audio Foundation Model.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes.

[BibT_eX]

[DOI]

Proceedings of the 33rd European Signal Processing Conference, 2025

2024

Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 32nd European Signal Processing Conference, 2024

Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2024

2023

BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning.

[BibT_eX]

[DOI]

Marc Delcroix

Jorge Bennasar Vázquez

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement.

[BibT_eX]

[DOI]

CoRR, 2023

Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

First-Shot Anomaly Sound Detection for Machine Condition Monitoring: A Domain Generalization Baseline.

[BibT_eX]

[DOI]

Proceedings of the 31st European Signal Processing Conference, 2023

Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Approach.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

ConceptBeam: Concept Driven Target Speech Extraction.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-View And Multi-Modal Event Detection Utilizing Transformer-Based Multi-Sensor Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Echo-Aware Adaptation of Sound Event Localization and Detection in Unknown Environments.

[BibT_eX]

[DOI]

Masahiro Yasuda

Yasunori Ohishi

Shoichiro Saito

Proceedings of the IEEE International Conference on Acoustics, 2022

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model.

[BibT_eX]

[DOI]

Proceedings of the 30th European Signal Processing Conference, 2022

2021

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2021

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation.

[BibT_eX]

[DOI]

Proceedings of the HEAR: Holistic Evaluation of Audio Representations, 2021

ToyADMOS2: Another Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection under Domain Shift Conditions.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

2020

Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval.

[BibT_eX]

[DOI]

CoRR, 2020

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation.

[BibT_eX]

[DOI]

CoRR, 2020

Crossmodal Sound Retrieval Based on Specific Target Co-Occurrence Denoted with Weak Labels.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Pair Expansion for Learning Multilingual Semantic Embeddings Using Disjoint Visually-Grounded Speech Audio Datasets.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Unsupervised Co-Segmentation for Athlete Movements and Live Commentaries Using Crossmodal Temporal Proximity.

[BibT_eX]

[DOI]

Yasunori Ohishi

Yuki Tanaka

Kunio Kashino

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Effects of Word-Frequency Based Pre- and Post- Processings for Audio Captioning.

[BibT_eX]

[DOI]

Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

2019

Crossmodal Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2019

2015

Generative Modeling of Voice Fundamental Frequency Contours.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2015

2014

Mixture of Gaussian process experts for predicting sung melodic contour with expressive dynamic fluctuations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Mondrian hidden Markov model for music signal processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Acoustic scene analysis based on latent acoustic topic and event allocation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2013

Generative modeling of speech F<sub>0</sub> contours.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Bayesian semi-supervised audio event transcription based on Markov indian buffet process.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Bayesian nonparametric music parser.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Automatic audio tag classification via semi-supervised canonical density estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A statistical model of speech F0 contours.

[BibT_eX]

[DOI]

Hirokazu Kameoka

Jonathan Le Roux

Yasunori Ohishi

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2010

2009

Automatic Identification for Singing Style based on Sung Melodic Contour Characterized in Phase Plane.

[BibT_eX]

[DOI]

Proceedings of the 10th International Society for Music Information Retrieval Conference, 2009

2008

Building and combining document and music spaces for music query-by-webpage system.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Parameter estimation method of F0 control model for singing voices.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007

A Stochastic Representation of the Dynamics of Sung Melody.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Music Information Retrieval, 2007

2006

Statistical Analysis for Thesaurus Construction using an Encyclopedic Corpus.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

2005

Discrimination between singing and speaking voices.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Yasunori Ohishi

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...