Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation.

[BibT_eX]

[DOI]

Xilin Jiang

Junkai Wu

Vishal Choudhari

Nima Mesgarani

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis.

[BibT_eX]

[DOI]

Xilin Jiang

Yinghao Aaron Li

Adrian Nicolas Florea

Cong Han

Nima Mesgarani

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation.

[BibT_eX]

[DOI]

Xilin Jiang

Cong Han

Nima Mesgarani

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience.

[BibT_eX]

[DOI]

CoRR, 2024

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify And Understand Speaker in Spoken Dialogue.

[BibT_eX]

[DOI]

Mark Hasegawa-Johnson

Mari Ostendorf

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SSAMBA: Self-Supervised Audio Representation Learning With Mamba State Space Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Exploring Self-supervised Contrastive Learning of Spatial Sound Event Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform.

[BibT_eX]

[DOI]

CoRR, 2023

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes.

[BibT_eX]

[DOI]

Xilin Jiang

Yinghao Aaron Li

Nima Mesgarani

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Phoneme-Level Bert for Enhanced Prosody of Text-To-Speech with Grapheme Predictions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Compute and Memory Efficient Universal Sound Source Separation.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2022

Learning Representations for New Sound Classes With Continual Self-Supervised Learning.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

2013

Demonstration of broadband inter-modal four-wave mixing in graded-index few-mode fibers.

[BibT_eX]

[DOI]

Proceedings of the 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), 2013

Xilin Jiang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...