Ruibo Fu

Orcid: 0000-0001-9598-1881

According to our database1, Ruibo Fu authored at least 101 papers between 2017 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan.
CoRR, April, 2026

SpeechPalette: A Comprehensive Speech Editing Method for Text-Based Speech Editing, One-Shot TTS and Attributes Editing.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2026

Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning.
CoRR, January, 2026

MSSG: Multi-Scale Speaker Graph Network for Active Speaker Detection.
IEEE Trans. Multim., 2026

PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Trainable EEG Interpolation and Structure-Sharing Dual-Path Encoders for Brain-Assisted Target Speaker Extraction.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis.
CoRR, December, 2025

InstructAudio: Unified speech and music generation with natural language instruction.
CoRR, November, 2025

When Audio Generators Become Good Listeners: Generative Features for Understanding Tasks.
CoRR, September, 2025

SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding.
CoRR, September, 2025

DegDiT: Controllable Audio Generation with Dynamic Event Graph Guided Diffusion Transformer.
CoRR, August, 2025

Fake Speech Wild: Detecting Deepfake Speech on Social Media Platform.
CoRR, August, 2025

Debunk and Infer: Multimodal Fake News Detection via Diffusion-Generated Evidence and LLM Reasoning.
CoRR, June, 2025

RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection.
CoRR, June, 2025

Deconfounded Reasoning for Multimodal Fake News Detection via Causal Intervention.
CoRR, April, 2025

Exploring Modality Disruption in Multimodal Fake News Detection.
CoRR, April, 2025

P2Mark: Plug-and-play Parameter-intrinsic Watermarking for Neural Speech Generation.
CoRR, April, 2025

Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition.
CoRR, January, 2025

Less is More? Textual-Only Language Model for AVI challenge 2025.
Proceedings of the 3rd International Workshop on Multimodal and Responsible Affective Computing, 2025

MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Towards Diverse and Efficient Audio Captioning via Diffusion Models.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Code-switching Mediated Sentence-level Semantic Learning.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

CFAD: A Chinese dataset for fake audio detection.
Speech Commun., 2024

SceneFake: An initial dataset and benchmarks for scene fake audio detection.
Pattern Recognit., 2024

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation.
CoRR, 2024

LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis.
CoRR, 2024

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation.
CoRR, 2024

Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge.
CoRR, 2024

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing.
CoRR, 2024

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation.
CoRR, 2024

Fake News Detection and Manipulation Reasoning via Large Vision-Language Models.
CoRR, 2024

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge.
CoRR, 2024

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation.
CoRR, 2024

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio.
CoRR, 2024

Emotion selectable end-to-end text-based speech editing.
Artif. Intell., 2024

Transferring Personality Knowledge to Multimodal Sentiment Analysis.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Temporal Shift for Personality Recognition with Pre-Trained Representations.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Does Current Deepfake Audio Detection Model Effectively Detect ALM-Based Deepfake Audio?
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

A Noval Feature via Color Quantisation for Fake Audio Detection.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Exploring the Role of Audio in Multimodal Misinformation Detection.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Unlocking the Power of Emotions: Enhancing Personality Trait Recognition Through Utilization of Emotional Cues.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Generalized Fake Audio Detection via Deep Stable Learning.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Learning Speech Representation from Contrastive Token-Acoustic Pretraining.
Proceedings of the IEEE International Conference on Acoustics, 2024

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding.
Proceedings of the IEEE International Conference on Acoustics, 2024

MisD-MoE: A Multimodal Misinformation Detection Framework with Adaptive Feature Selection.
Proceedings of the NeurIPS Efficient Natural Language and Speech Processing Workshop, 2024

2023
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era.
Proc. IEEE, October, 2023

Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Learning to Behave Like Clean Speech: Dual-Branch Knowledge Distillation for Noise-Robust Fake Audio Detection.
CoRR, 2023

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection.
CoRR, 2023

Adaptive Fake Audio Detection with Low-Rank Model Squeezing.
CoRR, 2023

TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
CoRR, 2023

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion.
CoRR, 2023

TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adaptive Fake Audio Detection with Low-Rank Model Squeezing.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

ADD 2023: the Second Audio Deepfake Detection Challenge.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

The VIBVG Speech Synthesis System for Blizzard Challenge 2023.
Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023

2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection.
CoRR, 2022

System Fingerprints Detection for DeepFake Audio: An Initial Dataset and Investigation.
CoRR, 2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.
CoRR, 2022

An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Fully Automated End-to-End Fake Audio Detection.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Half-Truth: A Partially Fake Audio Detection Dataset.
CoRR, 2021

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Half-Truth: A Partially Fake Audio Detection Dataset.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Patnet : A Phoneme-Level Autoregressive Transformer Network for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Proceedings of the IEEE International Conference on Acoustics, 2021

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Bi-Level Speaker Supervision for One-Shot Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The NLPR Speech Synthesis entry for Blizzard Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019
Phoneme Dependent Speaker Embedding and Model Factorization for Multi-speaker Speech Synthesis and Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2019

The NLPR Speech Synthesis entry for Blizzard Challenge 2019.
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

2018
On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017
The NLPR Speech Synthesis entry for Blizzard Challenge 2017.
Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017


  Loading...