Yuki Mitsufuji

Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, 2026

Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing.

[BibT_eX]

[DOI]

Charles Patrick Martin

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Do Foundational Audio Encoders Understand Music Structure?

[BibT_eX]

[DOI]

CoRR, December, 2025

Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal.

[BibT_eX]

[DOI]

Weihan Xu

Kan Jen Cheng

Koichi Saito

Muhammad Jehanzeb Mirza

Gopala Anumanchipalli

Paul Pu Liang

CoRR, December, 2025

AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path.

[BibT_eX]

[DOI]

CoRR, December, 2025

PAVAS: Physics-Aware Video-to-Audio Synthesis.

[BibT_eX]

[DOI]

CoRR, December, 2025

Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits.

[BibT_eX]

[DOI]

CoRR, December, 2025

C3G: Learning Compact 3D Representations with 2K Gaussians.

[BibT_eX]

[DOI]

CoRR, December, 2025

Distill, Forget, Repeat: A Framework for Continual Unlearning in Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

CoRR, December, 2025

LLM2Fx-Tools: Tool Calling For Music Post-Production.

[BibT_eX]

[DOI]

Seungheon Doh

CoRR, December, 2025

FoleyBench: A Benchmark For Video-to-Audio Models.

[BibT_eX]

[DOI]

CoRR, November, 2025

MeanFlow Transformers with Representation Autoencoders.

[BibT_eX]

[DOI]

CoRR, November, 2025

Automatic Music Mixing using a Generative Model of Effect Embeddings.

[BibT_eX]

[DOI]

Eloi Moliner

CoRR, November, 2025

'Studies for': A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model.

[BibT_eX]

[DOI]

CoRR, October, 2025

The Principles of Diffusion Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval.

[BibT_eX]

[DOI]

CoRR, October, 2025

Theoretical Refinement of CLIP by Utilizing Linear Structure of Optimal Similarity.

[BibT_eX]

[DOI]

CoRR, October, 2025

3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Automatic Music Sample Identification with Multi-Track Contrastive Learning.

[BibT_eX]

[DOI]

Alain Riou

Joan Serrà

CoRR, October, 2025

MSRBench: A Benchmarking Dataset for Music Source Restoration.

[BibT_eX]

[DOI]

CoRR, October, 2025

MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation.

[BibT_eX]

[DOI]

Akira Takahashi

CoRR, October, 2025

Leveraging Whisper Embeddings for Audio-based Lyrics Matching.

[BibT_eX]

[DOI]

CoRR, October, 2025

Attribution-by-design: Ensuring Inference-Time Provenance in Generative Music Systems.

[BibT_eX]

[DOI]

CoRR, October, 2025

TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator.

[BibT_eX]

[DOI]

CoRR, October, 2025

SoundReactor: Frame-level Online Video-to-Audio Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

VIRTUE: Visual-Interactive Text-Image Universal Embedder.

[BibT_eX]

[DOI]

CoRR, October, 2025

CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning.

[BibT_eX]

[DOI]

Sungho Lee

Anastasios N. Angelopoulos

CoRR, September, 2025

SAVGBench Dataset.

[BibT_eX]

[DOI]

Dataset, September, 2025

Music Arena: Live Evaluation for Text-to-Music.

[BibT_eX]

[DOI]

Yonghyun Kim

Wayne Chi

CoRR, July, 2025

Stereo Sound Event Localization and Detection with Onscreen/offscreen Classification.

[BibT_eX]

[DOI]

Irán R. Román

CoRR, July, 2025

Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation.

[BibT_eX]

[DOI]

CoRR, July, 2025

Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution.

[BibT_eX]

[DOI]

CoRR, July, 2025

Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance.

[BibT_eX]

[DOI]

CoRR, June, 2025

Large-Scale Training Data Attribution for Music Generative Models via Unlearning.

[BibT_eX]

[DOI]

CoRR, June, 2025

Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry.

[BibT_eX]

[DOI]

CoRR, June, 2025

DCASE2025 Task3 Stereo SELD Dataset.

[BibT_eX]

[DOI]

Irán R. Román

Dataset, June, 2025

Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image.

[BibT_eX]

[DOI]

CoRR, April, 2025

DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions.

[BibT_eX]

[DOI]

Chin-Yun Yu

CoRR, April, 2025

D^2USt3R: Enhancing 3D Reconstruction with 4D Pointmaps for Dynamic Scenes.

[BibT_eX]

[DOI]

CoRR, April, 2025

CARE: Aligning Language Models for Regional Cultural Awareness.

[BibT_eX]

[DOI]

CoRR, April, 2025

DCASE2025 Task3 Stereo SELD Dataset.

[BibT_eX]

[DOI]

Irán R. Román

Joshua Nathaniel Williams

Dataset, April, 2025

Distillation of Discrete Diffusion through Dimensional Correlations.

[BibT_eX]

[DOI]

Dataset, April, 2025

Cross-Modal Learning for Music-to-Music-Video Description Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Training Consistency Models with Variational Noise Coupling.

[BibT_eX]

[DOI]

CoRR, February, 2025

HumanGif: Single-View Human Diffusion with Generative Prior.

[BibT_eX]

[DOI]

CoRR, February, 2025

G2D2: Gradient-Guided Discrete Diffusion for Inverse Problem Solving.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models.

[BibT_eX]

[DOI]

Muhammad Jehanzeb Mirza

Trans. Mach. Learn. Res., 2025

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Music Foundation Model as Generic Booster for Music Downstream Tasks.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Reductive, Exclusionary, Normalising: The Limits of Generative AI Music.

[BibT_eX]

[DOI]

Fabio Morreale

Raul Masu

Trans. Int. Soc. Music. Inf. Retr., 2025

SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior.

[BibT_eX]

[DOI]

Chin-Yun Yu

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

Can Large Language Models Predict Audio Effects Parameters from Natural Language?

[BibT_eX]

[DOI]

Seungheon Doh

Juhan Nam

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures.

[BibT_eX]

[DOI]

Yen-Tung Yeh

Yi-Hsuan Yang

Proceedings of the 26th International Society for Music Information Retrieval Conference, 2025

ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors.

[BibT_eX]

[DOI]

Proceedings of the 26th International Society for Music Information Retrieval Conference, 2025

Aligning Text-to-Music Evaluation with Human Preferences.

[BibT_eX]

[DOI]

Proceedings of the 26th International Society for Music Information Retrieval Conference, 2025

Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification.

[BibT_eX]

[DOI]

Recep Oguz Araz

Guillem Cortès-Sebastià

Proceedings of the 26th International Society for Music Information Retrieval Conference, 2025

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the 26th International Society for Music Information Retrieval Conference, 2025

A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Improving Vector-Quantized Image Modeling with Latent Consistency-Matching Diffusion.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2025

StereoSync: Spatially-Aware Stereo Audio Generation from Video.

[BibT_eX]

[DOI]

Christian Marinoni

Riccardo F. Gramaccioni

Proceedings of the International Joint Conference on Neural Networks, 2025

Efficiency without Compromise: CLIP-aided Text-to-Image GANs with Increased Diversity.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2025

A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2025

VCT: Training Consistency Models with Variational Noise Coupling.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Supervised Contrastive Learning from Weakly-Labeled Audio Segments for Musical Version Matching.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Distillation of Discrete Diffusion through Dimensional Correlations.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

Muhammad Jehanzeb Mirza

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Towards Reporting Bias in Visual-Language Datasets: Bi-Modal Data Augmentation by Decoupling Object-Attribute Association.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Twenty-Five Years of MIR Research: Achievements, Practices, Evaluations, and Future Challenges.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Variable Bitrate Residual Vector Quantization for Audio Coding.

[BibT_eX]

[DOI]

Kyogu Lee

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

30+ Years of Source Separation Research: Achievements and Future Challenges.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

CARE: Multilingual Human Preference Learning for Cultural Awareness.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

Dyadic Mamba: Long-term Dyadic Human Motion Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

VinaBench: Benchmark for Faithful and Consistent Visual Narratives.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Performance Analysis of Active Noise Control Over a Spatial Region.

[BibT_eX]

[DOI]

Jihui Aimee Zhang

Naoki Murata

Prasanga N. Samarasinghe

Alexander L. Stempkovskiy

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

2024

The whole is greater than the sum of its parts: improving music source separation by bridging networks.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., December, 2024

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer.

[BibT_eX]

[DOI]

Dataset, April, 2024

The Sound Demixing Challenge 2023 - Cinematic Demixing Track.

[BibT_eX]

[DOI]

Tatiana Habruseva

Mikhail Sukhovei

Trans. Int. Soc. Music. Inf. Retr., January, 2024

The Sound Demixing Challenge 2023 - Music Demixing Track.

[BibT_eX]

[DOI]

Trans. Int. Soc. Music. Inf. Retr., January, 2024

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

TraSCE: Trajectory Steering for Concept Erasure.

[BibT_eX]

[DOI]

CoRR, 2024

Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

OpenMU: Your Swiss Army Knife for Music Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Mitigating Embedding Collapse in Diffusion Models for Categorical Data.

[BibT_eX]

[DOI]

CoRR, 2024

G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving.

[BibT_eX]

[DOI]

CoRR, 2024

<i>Jump Your Steps</i>: Optimizing Sampling Schedule of Discrete Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression.

[BibT_eX]

[DOI]

Kyogu Lee

CoRR, 2024

Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning.

[BibT_eX]

[DOI]

CoRR, 2024

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space.

[BibT_eX]

[DOI]

Yangming Li

Chieh-Hsin Lai

Carola-Bibiane Schönlieb

Stefano Ermon

CoRR, 2024

A Survey on Diffusion Models for Inverse Problems.

[BibT_eX]

[DOI]

Alexandros G. Dimakis

Mauricio Delbracio

CoRR, 2024

LOCKEY: A Novel Approach to Model Authentication and Deepfake Tracking.

[BibT_eX]

[DOI]

CoRR, 2024

Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer.

[BibT_eX]

[DOI]

CoRR, 2024

DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch.

[BibT_eX]

[DOI]

Sungho Lee

CoRR, 2024

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data.

[BibT_eX]

[DOI]

Yu-Hua Chen

Woosung Choi

CoRR, 2024

ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training.

[BibT_eX]

[DOI]

CoRR, 2024

Searching For Music Mixing Graphs: A Pruning Approach.

[BibT_eX]

[DOI]

Sungho Lee

CoRR, 2024

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information.

[BibT_eX]

[DOI]

CoRR, 2024

MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage.

[BibT_eX]

[DOI]

CoRR, 2024

GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

SpecMaskGIT: Masked Generative Modeling of Audio Spectrogram for Efficient Audio Synthesis and Beyond.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

Towards Assessing Data Replication in Music Generation With Music Similarity Metrics on Raw Audio.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

SilentCipher: Deep Audio Watermarking.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models.

[BibT_eX]

[DOI]

Simon Dixon

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Manifold Preserving Guided Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Zero- and Few-Shot Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance.

[BibT_eX]

[DOI]

Carlos Hernandez-Olivan

Koichi Saito

Naoki Murata

Chieh-Hsin Lai

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Semantic Communication with Deep Generative Models: An Overview.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription.

[BibT_eX]

[DOI]

Frank Cwitkowitz

Kin Wai Cheuk

Woosung Choi

Keisuke Toyama

Proceedings of the IEEE International Conference on Acoustics, 2024

BIGVSAN: Enhancing Gan-Based Neural Vocoders with Slicing Adversarial Network.

[BibT_eX]

[DOI]

Takashi Shibuya

Yuhta Takida

Proceedings of the IEEE International Conference on Acoustics, 2024

On the Language Encoder of Contrastive Cross-modal Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

DiffuCOMET: Contextual Commonsense Knowledge Diffusion.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network.

[BibT_eX]

[DOI]

Takashi Shibuya

Yuhta Takida

Dataset, September, 2023

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer.

[BibT_eX]

[DOI]

Dataset, July, 2023

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer.

[BibT_eX]

[DOI]

Dataset, July, 2023

STARSS23: Sony-TAu Realistic Spatial Soundscapes 2023.

[BibT_eX]

[DOI]

Aapo Hakala

Dataset, March, 2023

STARSS23: Sony-TAu Realistic Spatial Soundscapes 2023.

[BibT_eX]

[DOI]

Aapo Hakala

Alexander L. Stempkovskiy

Dataset, March, 2023

Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association.

[BibT_eX]

[DOI]

CoRR, 2023

Enhancing Semantic Communication with Deep Generative Models - An ICASSP Special Session Overview.

[BibT_eX]

[DOI]

CoRR, 2023

The Sound Demixing Challenge 2023 - Cinematic Demixing Track.

[BibT_eX]

[DOI]

Tatiana Habruseva

Mikhail Sukhovei

CoRR, 2023

On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization.

[BibT_eX]

[DOI]

CoRR, 2023

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation.

[BibT_eX]

[DOI]

CoRR, 2023

Diffusion-based Signal Refiner for Speech Separation.

[BibT_eX]

[DOI]

CoRR, 2023

Cross-modal Face- and Voice-style Transfer.

[BibT_eX]

[DOI]

CoRR, 2023

Adversarially Slicing Generative Networks: Discriminator Slices Feature for One-Dimensional Optimal Transport.

[BibT_eX]

[DOI]

CoRR, 2023

Extending Audio Masked Autoencoders toward Audio Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Automatic Piano Transcription With Hierarchical Frequency-Time Transformer.

[BibT_eX]

[DOI]

Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos.

[BibT_eX]

[DOI]

Taylor Berg-Kirkpatrick

Proceedings of the Eleventh International Conference on Learning Representations, 2023

An Attention-Based Approach to Hierarchical Multi-Label Music Instrument Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Hierarchical Diffusion Models for Singing Voice Neural Vocoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Unsupervised Vocal Dereverberation with Diffusion-Based Generative Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Diffroll: Diffusion-Based Generative Music Transcription with Unsupervised Pretraining Capability.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

CrossNet-Open-Unmix for Music Source Separation (X-UMXL).

[BibT_eX]

[DOI]

Dataset, September, 2022

STARSS22: Sony-TAu Realistic Spatial Soundscapes 2022 dataset.

[BibT_eX]

[DOI]

Sharath Adavanne

Yuichiro Koyama

Tuomas Virtanen

Dataset, May, 2022

STARSS22: Sony-TAu Realistic Spatial Soundscapes 2022 dataset.

[BibT_eX]

[DOI]

Adavanne Politis

Dataset, March, 2022

Preventing oversmoothing in VAE via generalized variance parameterization.

[BibT_eX]

[DOI]

Neurocomputing, 2022

A Versatile Diffusion-based Generative Refiner for Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2022

Robust One-Shot Singing Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2022

Regularizing Score-based Models with Score Fokker-Planck Equations.

[BibT_eX]

[DOI]

CoRR, 2022

Removing Distortion Effects in Music Using Deep Neural Networks.

[BibT_eX]

[DOI]

Johannes Imort

Yuichiro Koyama

CoRR, 2022

Automatic music mixing with deep learning and out-of-domain data.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

Distortion Audio Effects: Learning How to Recover the Clean Signal.

[BibT_eX]

[DOI]

Johannes Imort

Yuichiro Koyama

Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Amicable Examples for Informed Source Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-ACCDOA: Localizing And Detecting Overlapping Sounds From The Same Class With Auxiliary Duplicating Permutation Invariant Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Spatial Mixup: Directional Loudness Modification as Data Augmentation for Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Music Source Separation With Deep Equilibrium Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks.

[BibT_eX]

[DOI]

Bo-Yu Chen

Wei-Han Hsu

Yi-Hsuan Yang

Proceedings of the IEEE International Conference on Acoustics, 2022

ComFact: A Benchmark for Linking Contextual Commonsense Knowledge.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

STARSS22: A Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

2021

CrossNet-Open-Unmix for Music Source Separation (X-UMX-HQ).

[BibT_eX]

[DOI]

Dataset, May, 2021

CrossNet-Open-Unmix for Music Source Separation (X-UMX).

[BibT_eX]

[DOI]

Dataset, April, 2021

Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Source Mixing and Separation Robust Audio Steganography.

[BibT_eX]

[DOI]

CoRR, 2021

Music Demixing Challenge at ISMIR 2021.

[BibT_eX]

[DOI]

CoRR, 2021

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Training Speech Enhancement Systems with Noisy Speech Datasets.

[BibT_eX]

[DOI]

CoRR, 2021

Preventing Posterior Collapse Induced by Oversmoothing in Gaussian VAE.

[BibT_eX]

[DOI]

CoRR, 2021

Hierarchical disentangled representation learning for singing voice conversion.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2021

Adversarial Attacks on Audio Source Separation.

[BibT_eX]

[DOI]

Shota Inoue

Proceedings of the IEEE International Conference on Acoustics, 2021

Accdoa: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization And Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

All For One And One For All: Improving Music Separation By Bridging Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Psychophysiological Effect of Immersive Spatial Audio Experience Enhanced Using Sound Field Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Affective Computing and Intelligent Interaction, 2021

2020

Open-Unmix for Speech Enhancement (UMX SE).

[BibT_eX]

[DOI]

Dataset, May, 2020

Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Spherical-Harmonic-Domain Feedforward Active Noise Control Using Sparse Decomposition of Reference Signals from Distributed Sensor Arrays.

[BibT_eX]

[DOI]

Prasanga N. Samarasinghe

Naoki Murata

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Densely connected multidilated convolutional networks for dense prediction tasks.

[BibT_eX]

[DOI]

CoRR, 2020

D3Net: Densely connected multidilated DenseNet for music source separation.

[BibT_eX]

[DOI]

CoRR, 2020

Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net.

[BibT_eX]

[DOI]

CoRR, 2020

Improving Voice Separation by Incorporating End-To-End Speech Recognition.

[BibT_eX]

[DOI]

Sudarsanam Parthasaarathy

Sakya Basak

Sriram Ganapathy

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Array-Geometry-Aware Spatial Active Noise Control Based on Direction-of-Arrival Weighting.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Open-Unmix - A Reference Implementation for Music Source Separation.

[BibT_eX]

[DOI]

Dataset, September, 2019

Open-Unmix - A Reference Implementation for Music Source Separation.

[BibT_eX]

[DOI]

J. Open Source Softw., 2019

Closing the Training/Inference Gap for Deep Attractor Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Recursive Speech Separation for Unknown Number of Speakers.

[BibT_eX]

[DOI]

Sudarsanam Parthasaarathy

Nabarun Goswami

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Global and Local Mode-domain Adaptive Algorithms for Spatial Active Noise Control Using Higher-order Sources.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Improving DNN-based Music Source Separation using Phase Features.

[BibT_eX]

[DOI]

CoRR, 2018

Mmdenselstm: An Efficient Combination of Convolutional and Recurrent Neural Networks for Audio Source Separation.

[BibT_eX]

[DOI]

Nabarun Goswami

Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

Mode-Domain Spatial Active Noise Control Using Multiple Circular Arrays.

[BibT_eX]

[DOI]

Prasanga N. Samarasinghe

Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Mode Domain Spatial Active Noise Control Using Sparse Signal Representation.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Multi-Scale multi-band densenets for audio source separation.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017

Improving music source separation based on deep neural networks through data augmentation and network blending.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Supervised monaural source separation based on autoencoders.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain.

[BibT_eX]

[DOI]

Shoichi Koyama

Hiroshi Saruwatari

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Deep neural network based instrument extraction from music.

[BibT_eX]

[DOI]

Franck Giron

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

NMF-based blind source separation using a linear predictive coding error clustering criterion.

[BibT_eX]

[DOI]

Xin Guo

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

On the use of a spatial cue as prior information for stereo sound source separation based on spatially weighted non-negative tensor factorization.

[BibT_eX]

[DOI]

Axel Roebel

EURASIP J. Adv. Signal Process., 2014

Online NON-negative Tensor Deconvolution for source detection in 3DTV audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge.

[BibT_eX]

[DOI]