Wenhao Guan

According to our database¹, Wenhao Guan authored at least 29 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction.

[BibT_eX]

[DOI]

CoRR, September, 2025

XMUspeech Systems for the ASVspoof 5 Challenge.

[BibT_eX]

[DOI]

CoRR, September, 2025

Topo-Field: Topometric Mapping With Brain-Inspired Hierarchical Layout-Object-Position Fields.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., June, 2025

ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization.

[BibT_eX]

[DOI]

CoRR, June, 2025

DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech Codec.

[BibT_eX]

[DOI]

CoRR, May, 2025

SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow.

[BibT_eX]

[DOI]

CoRR, April, 2025

Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech Codec.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

InvoxSVC: Any-to-any Zero-shot Singing Voice Conversion with In-Context Learning in Latent Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Dynamic Language Group-based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical Routing.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2024

LOP-Field: Brain-inspired Layout-Object-Position Fields for Robotic Scene Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing Code-Switching Speech Recognition With LID-Based Collaborative Mixture of Experts Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

MinSpeech: A Corpus of Southern Min Dialect for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Efficient Integrated Features Based on Pre-trained Models for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Improving Multi-Speaker ASR With Overlap-Aware Encoding And Monotonic Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

SR-HuBERT : An Efficient Pre-Trained Model for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Multivariate Fourier Distribution Perturbation: Domain Shifts with Uncertainty in Frequency Domain.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Reflow-TTS: A Rectified Flow Model for High-Fidelity Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2018

Verifiable memory leakage-resilient dynamic searchable encryption.

[BibT_eX]

[DOI]

J. High Speed Networks, 2018

Wenhao Guan

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...