Yang Ai

Orcid: 0009-0006-0157-4980

According to our database1, Yang Ai authored at least 69 papers between 2009 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Universal Preference-Score-based Pairwise Speech Quality Assessment.
CoRR, June, 2025

SAMA: A Self-and-Mutual Attention Network for Accurate Recurrence Prediction of Non-Small Cell Lung Cancer Using Genetic and CT Data.
IEEE J. Biomed. Health Informatics, May, 2025

Vision-Integrated High-Quality Neural Speech Coding.
CoRR, May, 2025

Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising.
CoRR, May, 2025

PhonemeVec: A Phoneme-Level Contextual Prosody Representation For Speech Synthesis.
ACM Trans. Asian Low Resour. Lang. Inf. Process., March, 2025

Token-Prediction-Based Post-Processing for Low-Bitrate Speech Coding.
IEEE Signal Process. Lett., 2025

A Streamable Neural Audio Codec With Residual Scalar-Vector Quantization for Real-Time Communication.
IEEE Signal Process. Lett., 2025

Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement.
Neural Networks, 2025

A Study of Multi-Scale Feature Learning From Pre-Trained Models on Speaker Verification.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Can Automated Speech Recognition Errors Provide Valuable Clues for Alzheimer's Disease Detection?
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Aligning Noisy-Clean Speech Pairs at Feature and Embedding Levels for Learning Noise-Invariant Speaker Representations.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

CASC-XVC: Zero-Shot Cross-Lingual Voice Conversion with Content Accordant and Speaker Contrastive Losses.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Recursive Feature Learning from Pre-Trained Models for Spoofing Speech Detection.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Low-Latency Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

APCodec: A Neural Audio Codec With Parallel Amplitude and Phase Spectrum Encoding and Decoding.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram.
CoRR, 2024

Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate Control.
CoRR, 2024

Voice Attribute Editing with Text Prompt.
CoRR, 2024

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction.
CoRR, 2024

Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Stage-Wise and Prior-Aware Neural Speech Phase Prediction.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

MDCTCodec: A Lightweight MDCT-Based Neural Audio Codec Towards High Sampling Rate and Low Bitrate Scenarios.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Speech Reconstruction from Silent Lip and Tongue Articulation by Diffusion Models and Text-Guided Pseudo Target Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

MultiStage Speech Bandwidth Extension with Flexible Sampling Rate Control.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Refining Self-supervised Learnt Speech Representation using Brain Activations.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Low-Bitrate Neural Audio Codec Framework with Bandwidth Reduction and Recovery for High-Sampling-Rate Waveforms.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

DeepGAN: A fast and high-quality time-domain-based neural vocoder for low-resource scenarios.
Proceedings of the 8th International Conference on Digital Signal Processing, 2024

Considering Temporal Connection between Turns for Conversational Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

Biologically Interpretable Model for Precise Recurrence Prediction of Non-Small Cell Lung Cancer.
Proceedings of the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2024

2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Long-Frame-Shift Neural Speech Phase Prediction With Spectral Continuity Enhancement and Interpolation Error Compensation.
IEEE Signal Process. Lett., 2023

A Dynamic Network for Efficient Point Cloud Registration.
CoRR, 2023

Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis.
CoRR, 2023

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

CMIR: A Unified Cross-Modality Framework for Preoperative Accurate Prediction of Microvascular Invasion in Hepatocellular Carcinoma.
Proceedings of the MEDINFO 2023 - The Future Is Accessible, 2023

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Speech Reconstruction from Silent Tongue and Lip Articulation by Pseudo Target Generation and Domain Adversarial Training.
Proceedings of the IEEE International Conference on Acoustics, 2023

Zero-Shot Personalized Lip-To-Speech Synthesis with Face Image Based Voice Control.
Proceedings of the IEEE International Conference on Acoustics, 2023

Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Self-Attention Based Fusion Model of Radiomics and Deep Features for Early Recurrence Prediction in NSCLC.
Proceedings of the 12th IEEE Global Conference on Consumer Electronics, 2023

MVI-Wise GAN: Synthetic MRI to Improve Microvascular Invasion Prediction in Hepatocellular Carcinoma.
Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

Vision-Guided Attention-Enhanced Network for Predicting Microvascular Invasion in Hepatocellular Carcinoma.
Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

The USTC-NERCSLIP System for the Track 1.2 of Audio Deepfake Detection (ADD 2023) Challenge.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

2022
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

A robust encryption watermarking algorithm for medical images based on ridgelet-DCT and THM double chaos.
J. Cloud Comput., 2022

Residual Multilayer Perceptrons for Genotype-Guided Recurrence Prediction of Non-Small Cell Lung Cancer.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

2021
BDDR: An Effective Defense Against Textual Backdoor Attacks.
Comput. Secur., 2021

Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Phase Spectrum Recovery for Enhancing Low-Quality Speech Captured by Laser Microphones.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

2020
A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Robust Watermarking Algorithm for Medical Volume Data in Internet of Medical Things.
IEEE Access, 2020

Reverberation Modeling for Source-Filter-Based Neural Vocoder.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Online Speaker Adaptation for WaveNet-based Neural Vocoders.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Zero-Watermarking Algorithm for Medical Images Based on Dual-Tree Complex Wavelet Transform and Discrete Cosine Transform.
J. Medical Imaging Health Informatics, 2019

Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Dnn-based Spectral Enhancement for Neural Waveform Generators with Low-bit Quantization.
Proceedings of the IEEE International Conference on Acoustics, 2019

The USTC System for Blizzard Challenge 2019.
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

2018
Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Samplernn-Based Neural Vocoder for Statistical Parametric Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2010
An Ontology-Based Platform for Scientific Writing and Publishing.
Proceedings of the Future Generation Information Technology, 2010

2009
Computing Minimal Diagnosis with Binary Decision Diagrams Algorithm.
Proceedings of the Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009


  Loading...