Tao Wang

Orcid: 0000-0003-1490-6973

Affiliations:

Chinese Academy of Science, National Laboratory of Pattern Recognition, Institute of Automation, Beijing, China,
University of Chinese Academy of Sciences, School of Artificial Intelligence, Beijing, China

According to our database¹, Tao Wang authored at least 62 papers between 2009 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation.

[BibT_eX]

[DOI]

CoRR, April, 2026

SpeechPalette: A Comprehensive Speech Editing Method for Text-Based Speech Editing, One-Shot TTS and Attributes Editing.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2026

Edit Content, Preserve Acoustics: Imperceptible Text-Based Speech Editing via Self-Consistency Rewards.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

P2Mark: Plug-and-play Parameter-intrinsic Watermarking for Neural Speech Generation.

[BibT_eX]

[DOI]

CoRR, April, 2025

HeRo: A State Machine-Based, Fault-Tolerant Framework for Heterogeneous Multi-Robot Collaboration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Assessing growth potential of careers with occupational mobility network and ensemble framework.

[BibT_eX]

[DOI]

Eng. Appl. Artif. Intell., January, 2024

CFAD: A Chinese dataset for fake audio detection.

[BibT_eX]

[DOI]

Speech Commun., 2024

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2024

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification.

[BibT_eX]

[DOI]

CoRR, 2024

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2024

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing.

[BibT_eX]

[DOI]

CoRR, 2024

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation.

[BibT_eX]

[DOI]

CoRR, 2024

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Emotion selectable end-to-end text-based speech editing.

[BibT_eX]

[DOI]

Artif. Intell., 2024

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Residual Speaker Representation for One-Shot Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Multi-modal Adversarial Training for Zero-Shot Voice Cloning.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Fewer-Token Neural Speech Codec with Time-Invariant Codes.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Learning Speech Representation from Contrastive Token-Acoustic Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Adversarial Representation Mechanism Learning for Network Embedding.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2023

Amer: A New Attribute-Missing Network Embedding Approach.

[BibT_eX]

[DOI]

IEEE Trans. Cybern., 2023

Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Fewer-token Neural Speech Codec with Time-invariant Codes.

[BibT_eX]

[DOI]

CoRR, 2023

Controllable Residual Speaker Representation for Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2023

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion.

[BibT_eX]

[DOI]

CoRR, 2023

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2023

Slow-Fast Time Parameter Aggregation Network for Class-Incremental Lip Reading.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

ADD 2023: the Second Audio Deepfake Detection Challenge.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

The VIBVG Speech Synthesis System for Blizzard Challenge 2023.

[BibT_eX]

[DOI]

Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023

2022

Under-Display Camera Image Enhancement via Cascaded Curve Estimation.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

EmoFake: An Initial Dataset for Emotion Fake Audio Detection.

[BibT_eX]

[DOI]

CoRR, 2022

SJ-HD^2R: Selective Joint High Dynamic Range and Denoising Imaging for Dynamic Scenes.

[BibT_eX]

[DOI]

CoRR, 2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio.

[BibT_eX]

[DOI]

Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

NTIRE 2022 Challenge on Night Photography Rendering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Powerful Graph Convolutional Networks with Adaptive Propagation Mechanism for Homophily and Heterophily.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Powerful Graph Convolutioal Networks with Adaptive Propagation Mechanism for Homophily and Heterophily.

[BibT_eX]

[DOI]

CoRR, 2021

Learned Smartphone ISP on Mobile NPUs with Deep Learning, Mobile AI 2021 Challenge: Report.

[BibT_eX]

[DOI]

CoRR, 2021

Half-Truth: A Partially Fake Audio Detection Dataset.

[BibT_eX]

[DOI]

CoRR, 2021

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Half-Truth: A Partially Fake Audio Detection Dataset.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Bi-Level Speaker Supervision for One-Shot Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The NLPR Speech Synthesis entry for Blizzard Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2009

Building the Semantic Relations-Based Web Services Registry through Services Mining.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science, 2009

Tao Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...