Haohe Liu

Orcid: 0000-0003-1036-7888

According to our database¹, Haohe Liu authored at least 76 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs.

[BibT_eX]

[DOI]

CoRR, February, 2026

SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training.

[BibT_eX]

[DOI]

CoRR, January, 2026

Inference-time Scaling for Diffusion-based Audio Super-resolution.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities.

[BibT_eX]

[DOI]

CoRR, September, 2025

Region-Specific Audio Tagging for Spatial Sound.

[BibT_eX]

[DOI]

CoRR, September, 2025

DreamAudio: Customized Text-to-Audio Generation with Diffusion Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

DualMark: Identifying Model and Training Data Origins in Generated Audio.

[BibT_eX]

[DOI]

CoRR, August, 2025

AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion.

[BibT_eX]

[DOI]

CoRR, May, 2025

Exploring the User Experience of AI-Assisted Sound Searching Systems for Creative Workflows.

[BibT_eX]

[DOI]

CoRR, April, 2025

Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture.

[BibT_eX]

[DOI]

CoRR, April, 2025

HandSplat: Embedding-Driven Gaussian Splatting for High-Fidelity Hand Rendering.

[BibT_eX]

[DOI]

CoRR, March, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Audio-FLAN: A Preliminary Release.

[BibT_eX]

[DOI]

CoRR, February, 2025

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, February, 2025

Multimodal Fish Feeding Intensity Assessment in Aquaculture.

[BibT_eX]

[DOI]

IEEE Trans Autom. Sci. Eng., 2025

DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

EnvSDD: Benchmarking Environmental Sound Deepfake Detection.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

FlowSep: Language-Queried Sound Separation with Rectified Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

The APSIPA ASC 2025 Grand Challenge on City and Time-Aware Semi-Supervised Acoustic Scene Classification: Summary and Results.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

2024

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., December, 2024

NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., June, 2024

IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift Evaluation Dataset.

[BibT_eX]

[DOI]

Dataset, March, 2024

IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift Development Dataset.

[BibT_eX]

[DOI]

Dataset, February, 2024

IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift.

[BibT_eX]

[DOI]

Dataset, January, 2024

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text.

[BibT_eX]

[DOI]

CoRR, 2024

Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review.

[BibT_eX]

[DOI]

CoRR, 2024

Zero-Shot Audio Captioning Using Soft and Hard Prompts.

[BibT_eX]

[DOI]

CoRR, 2024

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift.

[BibT_eX]

[DOI]

CoRR, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

Efficient Audio Captioning with Encoder-Level Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Neural Compression Augmentation for Contrastive Audio Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

First-Shot Unsupervised Anomalous Sound Detection with Unknown Anomalies Estimated by Metadata-Assisted Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Retrieval-Augmented Text-to-Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Audiosr: Versatile Audio Super-Resolution at Scale.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

MusicLDM: Enhancing Novelty in text-to-music Generation Using Beat-Synchronous mixup Strategies.

[BibT_eX]

[DOI]

Taylor Berg-Kirkpatrick

Shlomo Dubnov

Proceedings of the IEEE International Conference on Acoustics, 2024

Text-Queried Target Sound Event Localization.

[BibT_eX]

[DOI]

Proceedings of the 32nd European Signal Processing Conference, 2024

Learning Temporal Resolution in Spectrogram for Audio Classification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift.

[BibT_eX]

[DOI]

Dataset, December, 2023

Learning to detect an animal sound from five examples.

[BibT_eX]

[DOI]

Ariana Strandburg-Peshkin

Ecol. Informatics, November, 2023

Balanced SNR-Aware Distillation for Guided Text-to-Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Synth-AC: Enhancing Audio Captioning with Synthetic Supervision.

[BibT_eX]

[DOI]

CoRR, 2023

Multimodal Fish Feeding Intensity Assessment in Aquaculture.

[BibT_eX]

[DOI]

CoRR, 2023

Separate Anything You Describe.

[BibT_eX]

[DOI]

CoRR, 2023

WavJourney: Compositional Audio Creation with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Text-Driven Foley Sound Generation With Latent Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2023

E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks.

[BibT_eX]

[DOI]

Arshdeep Singh

Haohe Liu

Mark D. Plumbley

CoRR, 2023

Latent Diffusion Model Based Foley Sound Generation System For DCASE Challenge 2023 Task 7.

[BibT_eX]

[DOI]

CoRR, 2023

Learning to detect an animal sound from five examples.

[BibT_eX]

[DOI]

Ariana Strandburg-Peshkin

CoRR, 2023

Universal Source Separation with Weakly Labelled Data.

[BibT_eX]

[DOI]

Taylor Berg-Kirkpatrick

Shlomo Dubnov

Mark D. Plumbley

CoRR, 2023

Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study.

[BibT_eX]

[DOI]

CoRR, 2023

Ontology-aware Learning and Evaluation for Audio Tagging.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapting Language-Audio Models as Few-Shot Audio Learners.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Simple Pooling Front-Ends for Efficient Audio Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Pre-Trained AudioLDM for Sound Generation: A Benchmark Study.

[BibT_eX]

[DOI]

Proceedings of the 31st European Signal Processing Conference, 2023

2022

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech.

[BibT_eX]

[DOI]

CoRR, 2022

Learning the Spectrogram Temporal Resolution for Audio Classification.

[BibT_eX]

[DOI]

CoRR, 2022

Surrey System for DCASE 2022 Task 5: Few-shot Bioacoustic Event Detection with Segment-level Metric Learning.

[BibT_eX]

[DOI]

CoRR, 2022

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Separate What You Describe: Language-Queried Audio Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Neural Vocoder is All You Need for Speech Super-resolution.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Leveraging Pre-trained BERT for Audio Captioning.

[BibT_eX]

[DOI]

Proceedings of the 30th European Signal Processing Conference, 2022

Segment-Level Metric Learning for Few-Shot Bioacoustic Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

2021

CWS-PResUNet: Music Source Separation with Channel-wise Subband Phase-aware ResUNet.

[BibT_eX]

[DOI]

Haohe Liu

Qiuqiang Kong

Jiafeng Liu

CoRR, 2021

VoiceFixer: Toward General Speech Restoration With Neural Vocoder.

[BibT_eX]

[DOI]

CoRR, 2021

Joint Echo Cancellation and Noise Suppression based on Cascaded Magnitude and Complex Mask Estimation.

[BibT_eX]

[DOI]

CoRR, 2021

Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021

Speech Enhancement with Weakly Labelled Data from AudioSet.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020

Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

Design and Visualization of Guided GAN on MNIST dataset.

[BibT_eX]

[DOI]

Haohe Liu

Siqi Yao

Yulin Wang

Proceedings of the 3rd International Conference on Graphics and Signal Processing, 2019

Haohe Liu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...