Shusuke Takahashi

CoRR, October, 2025

VIRTUE: Visual-Interactive Text-Image Universal Embedder.

[BibT_eX]

[DOI]

CoRR, October, 2025

Noise-to-Notes: Diffusion-based Generation and Refinement for Automatic Drum Transcription.

[BibT_eX]

[DOI]

CoRR, September, 2025

SAVGBench Dataset.

[BibT_eX]

[DOI]

Dataset, September, 2025

Stereo Sound Event Localization and Detection with Onscreen/offscreen Classification.

[BibT_eX]

[DOI]

Irán R. Román

CoRR, July, 2025

DCASE2025 Task3 Stereo SELD Dataset.

[BibT_eX]

[DOI]

Irán R. Román

Dataset, June, 2025

DCASE2025 Task3 Stereo SELD Dataset.

[BibT_eX]

[DOI]

Irán R. Román

Marco A. Martínez Ramírez

Dataset, April, 2025

Music Foundation Model as Generic Booster for Music Downstream Tasks.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

Muhammad Jehanzeb Mirza

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

2024

The whole is greater than the sum of its parts: improving music source separation by bridging networks.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., December, 2024

The Sound Demixing Challenge 2023 - Cinematic Demixing Track.

[BibT_eX]

[DOI]

Alexander L. Stempkovskiy

Tatiana Habruseva

Mikhail Sukhovei

Trans. Int. Soc. Music. Inf. Retr., January, 2024

SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

OpenMU: Your Swiss Army Knife for Music Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond.

[BibT_eX]

[DOI]

CoRR, 2024

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training.

[BibT_eX]

[DOI]

CoRR, 2024

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation.

[BibT_eX]

[DOI]

CoRR, 2024

SpecMaskGIT: Masked Generative Modeling of Audio Spectrogram for Efficient Audio Synthesis and Beyond.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

Zero- and Few-Shot Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

STARSS23: Sony-TAu Realistic Spatial Soundscapes 2023.

[BibT_eX]

[DOI]

Aapo Hakala

Dataset, March, 2023

STARSS23: Sony-TAu Realistic Spatial Soundscapes 2023.

[BibT_eX]

[DOI]

Aapo Hakala

Alexander L. Stempkovskiy

Dataset, March, 2023

The Sound Demixing Challenge 2023 - Cinematic Demixing Track.

[BibT_eX]

[DOI]

Tatiana Habruseva

Mikhail Sukhovei

CoRR, 2023

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation.

[BibT_eX]

[DOI]

CoRR, 2023

Diffusion-based Signal Refiner for Speech Separation.

[BibT_eX]

[DOI]

CoRR, 2023

Extending Audio Masked Autoencoders toward Audio Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

An Attention-Based Approach to Hierarchical Multi-Label Music Instrument Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Diffroll: Diffusion-Based Generative Music Transcription with Unsupervised Pretraining Capability.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

CrossNet-Open-Unmix for Music Source Separation (X-UMXL).

[BibT_eX]

[DOI]

Dataset, September, 2022

STARSS22: Sony-TAu Realistic Spatial Soundscapes 2022 dataset.

[BibT_eX]

[DOI]

Sharath Adavanne

Yuichiro Koyama

Naoya Takahashi

Tuomas Virtanen

Dataset, May, 2022

STARSS22: Sony-TAu Realistic Spatial Soundscapes 2022 dataset.

[BibT_eX]

[DOI]

Adavanne Politis

Dataset, March, 2022

An Approach to Collecting Object Graphs for Data-structure Live Programming Based on a Language Implementation Framework.

[BibT_eX]

[DOI]

J. Inf. Process., 2022

Preventing oversmoothing in VAE via generalized variance parameterization.

[BibT_eX]

[DOI]

Neurocomputing, 2022

A Versatile Diffusion-based Generative Refiner for Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2022

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Multi-ACCDOA: Localizing And Detecting Overlapping Sounds From The Same Class With Auxiliary Duplicating Permutation Invariant Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Character Error Rate is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-Box Acoustic Models.

[BibT_eX]

[DOI]

Ryosuke Sawata

Yosuke Kashiwagi

Proceedings of the IEEE International Conference on Acoustics, 2022

Spatial Mixup: Directional Loudness Modification as Data Augmentation for Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Music Source Separation With Deep Equilibrium Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

STARSS22: A Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

2021

CrossNet-Open-Unmix for Music Source Separation (X-UMX-HQ).

[BibT_eX]

[DOI]

Dataset, May, 2021

CrossNet-Open-Unmix for Music Source Separation (X-UMX).

[BibT_eX]

[DOI]

Dataset, April, 2021

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Preventing Posterior Collapse Induced by Oversmoothing in Gaussian VAE.

[BibT_eX]

[DOI]

CoRR, 2021

Manifold-Aware Deep Clustering: Maximizing Angles Between Embedding Vectors Based on Regular Simplex.

[BibT_eX]

[DOI]

Keitaro Tanaka

Ryosuke Sawata