Jixun Yao
Orcid: 0000-0002-5324-7360
According to our database1,
Jixun Yao
authored at least 41 papers
between 2020 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization.
CoRR, July, 2025
Aligning Generative Speech Enhancement with Human Preferences via Direct Preference Optimization.
CoRR, July, 2025
StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding.
CoRR, June, 2025
CoRR, May, 2025
ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech.
CoRR, May, 2025
CoRR, February, 2025
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching.
CoRR, 2024
CoRR, 2024
CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching.
CoRR, 2024
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings.
CoRR, 2024
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.
CoRR, 2024
Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling.
CoRR, 2024
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Promptvc: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts.
Proceedings of the IEEE International Conference on Acoustics, 2024
GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
Dualvc 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition.
CoRR, 2023
Exploring the Power of Cross-Contextual Large Language Model in Mimic Emotion Prediction.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Distinguishable Speaker Anonymization Based on Formant and Fundamental Frequency Scaling.
Proceedings of the IEEE International Conference on Acoustics, 2023
Preserving Background Sound in Noise-Robust Voice Conversion Via Multi-Task Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Proceedings of the 6th International Conference on Computer Science and Artificial Intelligence, 2022
2020
Proceedings of the CSAI 2020: 2020 4th International Conference on Computer Science and Artificial Intelligence, 2020