Yiwei Guo

Orcid: 0000-0002-2681-717X

According to our database1, Yiwei Guo authored at least 39 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction.
CoRR, August, 2025

Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy.
CoRR, June, 2025

CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate.
CoRR, June, 2025

Towards General Discrete Speech Codec for Complex Acoustic Environments: A Study of Reconstruction and Downstream Task Consistency.
CoRR, May, 2025

Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate.
CoRR, May, 2025

TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision.
CoRR, March, 2025

Recent Advances in Discrete Speech Tokens: A Review.
CoRR, February, 2025

Multi-view attributed graph clustering based on graph diffusion convolution with adaptive fusion.
Expert Syst. Appl., 2025

Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024
Impacts of Water Diversion Projects on Vegetation Coverage in Central Yunnan Province, China (2017-2022).
Remote. Sens., July, 2024

Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective.
CoRR, 2024

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec.
CoRR, 2024

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders.
CoRR, 2024

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.
CoRR, 2024

Detection Method of Teaching Discourse Richness Based on Prompt Learning and Pre-Trained Language Model.
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2024

Attention-Constrained Inference For Robust Decoder-Only Text-to-Speech.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

On the Effectiveness of Acoustic BPE in Decoder-Only TTS.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Acoustic BPE for Speech Generation with Discrete Tokens.
Proceedings of the IEEE International Conference on Acoustics, 2024

Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations.
Proceedings of the IEEE International Conference on Acoustics, 2024

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention.
Proceedings of the IEEE International Conference on Acoustics, 2024

VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching.
Proceedings of the IEEE International Conference on Acoustics, 2024

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations.
CoRR, 2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Joint Node Representation Learning and Clustering for Attributed Graph via Graph Diffusion Convolution.
Proceedings of the International Joint Conference on Neural Networks, 2023

DiffVoice: Text-to-Speech with Latent Diffusion.
Proceedings of the IEEE International Conference on Acoustics, 2023

Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Spatio-Temporal Dynamics of Entropy in EEGS during Music Stimulation of Alzheimer's Disease Patients with Different Degrees of Dementia.
Entropy, 2022

BiasedWalk: Learning Global-aware Node Embeddings via Biased Sampling.
CoRR, 2022

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

2020
A Reinforcement Learning Approach to Train Timetabling for Inter-City High Speed Railway Lines.
Proceedings of the 5th IEEE International Conference on Intelligent Transportation Engineering, 2020


  Loading...