We stand with Ukraine

We stand with Ukraine

Jixun Yao

Orcid: 0000-0002-5324-7360

According to our database¹, Jixun Yao authored at least 44 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, October, 2025

MPO: Multidimensional Preference Optimization for Language Model-based Text-to-Speech.

[BibT_eX]

[DOI]

,

,

,

CoRR, September, 2025

DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, July, 2025

Aligning Generative Speech Enhancement with Human Preferences via Direct Preference Optimization.

[BibT_eX]

[DOI]

,

,

,

,

Sabato Marco Siniscalchi

,

CoRR, July, 2025

StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, June, 2025

Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, February, 2025

Zero-Shot Voice Conversion via Content-Aware Timbre Ensemble and Conditional Flow Matching.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Signal Process. Lett., 2025

EASY: Emotion-aware Speaker Anonymization via Factorized Distillation.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification.

[BibT_eX]

[DOI]

,

,

,

,

,

John H. L. Hansen

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix.

[BibT_eX]

[DOI]

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

NTU-NPU System for Voice Privacy 2024 Challenge.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Guangcheng Zhao

,

,

CoRR, 2024

The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Promptvc: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Dualvc 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Timbre-Reserved Adversarial Attack in Speaker Identification.

[BibT_eX]

[DOI]

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Timbre-reserved Adversarial Attack in Speaker Identification.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2023

Exploring the Power of Cross-Contextual Large Language Model in Mimic Emotion Prediction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Distinguishable Speaker Anonymization Based on Formant and Fundamental Frequency Scaling.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Preserving Background Sound in Noise-Robust Voice Conversion Via Multi-Task Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Salt: Distinguishable Speaker Anonymization Through Latent Space Transformation.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

NWPU-ASLP System for the VoicePrivacy 2022 Challenge.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2022

High Quality and Similarity One-Shot Voice Conversion Using End-to-End Model.

[BibT_eX]

[DOI]

,

Proceedings of the 6th International Conference on Computer Science and Artificial Intelligence, 2022

2020

A Reward Shaping Method based on Meta-LSTM for Continuous Control of Robot.

[BibT_eX]

[DOI]

,

,

Proceedings of the CSAI 2020: 2020 4th International Conference on Computer Science and Artificial Intelligence, 2020

Loading...