Jixun Yao

Orcid: 0000-0002-5324-7360

According to our database1, Jixun Yao authored at least 41 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization.
CoRR, July, 2025

Aligning Generative Speech Enhancement with Human Preferences via Direct Preference Optimization.
CoRR, July, 2025

StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding.
CoRR, June, 2025

EASY: Emotion-aware Speaker Anonymization via Factorized Distillation.
CoRR, May, 2025

ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech.
CoRR, May, 2025

Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech.
CoRR, February, 2025

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching.
CoRR, 2024

CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion.
CoRR, 2024

CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching.
CoRR, 2024

The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings.
CoRR, 2024

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.
CoRR, 2024

NTU-NPU System for Voice Privacy 2024 Challenge.
CoRR, 2024

Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling.
CoRR, 2024

Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models.
CoRR, 2024

The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Promptvc: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts.
Proceedings of the IEEE International Conference on Acoustics, 2024

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Dualvc 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Timbre-Reserved Adversarial Attack in Speaker Identification.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Timbre-reserved Adversarial Attack in Speaker Identification.
CoRR, 2023

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition.
CoRR, 2023

Exploring the Power of Cross-Contextual Large Language Model in Mimic Emotion Prediction.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Distinguishable Speaker Anonymization Based on Formant and Fundamental Frequency Scaling.
Proceedings of the IEEE International Conference on Acoustics, 2023

Preserving Background Sound in Noise-Robust Voice Conversion Via Multi-Task Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features.
Proceedings of the IEEE International Conference on Acoustics, 2023

The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Salt: Distinguishable Speaker Anonymization Through Latent Space Transformation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
NWPU-ASLP System for the VoicePrivacy 2022 Challenge.
CoRR, 2022

High Quality and Similarity One-Shot Voice Conversion Using End-to-End Model.
Proceedings of the 6th International Conference on Computer Science and Artificial Intelligence, 2022

2020
A Reward Shaping Method based on Meta-LSTM for Continuous Control of Robot.
Proceedings of the CSAI 2020: 2020 4th International Conference on Computer Science and Artificial Intelligence, 2020


  Loading...