Takashi Shibuya

Orcid: 0000-0002-4277-0164

Affiliations:
  • Sony Corporation, Tokyo, Japan
  • University of Tsukuba, Japan
  • University of Tokyo, Japan (former)


According to our database1, Takashi Shibuya authored at least 47 papers between 2009 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Stereo Sound Event Localization and Detection with Onscreen/offscreen Classification.
CoRR, July, 2025

Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance.
CoRR, June, 2025

Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry.
CoRR, June, 2025

Efficiency without Compromise: CLIP-aided Text-to-Image GANs with Increased Diversity.
CoRR, June, 2025

Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image.
CoRR, April, 2025

HumanGif: Single-View Human Diffusion with Generative Prior.
CoRR, February, 2025

CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation.
CoRR, January, 2025

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

Dyadic Mamba: Long-term Dyadic Human Motion Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer.
Dataset, April, 2024

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes.
Trans. Mach. Learn. Res., 2024

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis.
CoRR, 2024

SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation.
CoRR, 2024

TraSCE: Trajectory Steering for Concept Erasure.
CoRR, 2024

Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning.
CoRR, 2024

Embedded Topic Models Enhanced by Wikification.
CoRR, 2024

A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation.
CoRR, 2024

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond.
CoRR, 2024

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training.
CoRR, 2024

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation.
CoRR, 2024

Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation.
CoRR, 2024

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation.
CoRR, 2024

GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SpecMaskGIT: Masked Generative Modeling of Audio Spectrogram for Efficient Audio Synthesis and Beyond.
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Zero- and Few-Shot Sound Event Localization and Detection.
Proceedings of the IEEE International Conference on Acoustics, 2024

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders.
Proceedings of the IEEE International Conference on Acoustics, 2024

BIGVSAN: Enhancing Gan-Based Neural Vocoders with Slicing Adversarial Network.
Proceedings of the IEEE International Conference on Acoustics, 2024

On the Language Encoder of Contrastive Cross-modal Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network.
Dataset, September, 2023

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer.
Dataset, July, 2023

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer.
Dataset, July, 2023

Extending Audio Masked Autoencoders toward Audio Restoration.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

XMD: An End-to-End Framework for Interactive Explanation-Based Debugging of NLP Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

2022
A Versatile Diffusion-based Generative Refiner for Speech Enhancement.
CoRR, 2022

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization.
Proceedings of the International Conference on Machine Learning, 2022

Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2020
Nested Named Entity Recognition via Second-best Sequence Learning and Decoding.
Trans. Assoc. Comput. Linguistics, 2020

2013
Audio fingerprinting robust against reverberation and noise based on quantification of sinusoidality.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

2010
Learning Interaction Rules through Compression of Sensori-Motor Causality Space.
Proceedings of the Tenth International Conference on Epigenetic Robotics (EpiRob 2010), 2010

2009
Causality quantification and its applications: structuring and modeling of multivariate time series.
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28, 2009


  Loading...