Yiwei Guo

This page is a disambiguation page, it actually contains multiple papers from persons of the same or a similar name.

Bibliography

2026

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding.

[BibT_eX]

[DOI]

CoRR, May, 2026

WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens.

[BibT_eX]

[DOI]

CoRR, May, 2026

Recent Advances in Discrete Speech Tokens: A Review.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2026

RAS: a Reliability Oriented Metric for Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, April, 2026

A Survey on Speech Large Language Models for Understanding.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., January, 2026

Blind Interference Suppression for IRS-Aided Robust Wireless Communications.

[BibT_eX]

[DOI]

IEEE Internet Things J., 2026

A scalable progressive cellular framework for learning-based high-speed rail timetabling and platforming.

[BibT_eX]

[DOI]

Expert Syst. Appl., 2026

AHAMask: Reliable Task Specification for Large Audio Language Models Without Instructions.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction.

[BibT_eX]

[DOI]

CoRR, August, 2025

Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy.

[BibT_eX]

[DOI]

CoRR, June, 2025

CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate.

[BibT_eX]

[DOI]

CoRR, June, 2025

Multi-view attributed graph clustering based on graph diffusion convolution with adaptive fusion.

[BibT_eX]

[DOI]

Lijuan Zhou

Yiwei Guo

Zhihong Zhang

Expert Syst. Appl., 2025

Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Towards General Discrete Speech Codec for Complex Acoustic Environments: A Study of Reconstruction and Downstream Task Consistency.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Impacts of Water Diversion Projects on Vegetation Coverage in Central Yunnan Province, China (2017-2022).

[BibT_eX]

[DOI]

Remote. Sens., July, 2024

Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective.

[BibT_eX]

[DOI]

CoRR, 2024

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders.

[BibT_eX]

[DOI]

CoRR, 2024

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.

[BibT_eX]

[DOI]

CoRR, 2024

Detection Method of Teaching Discourse Richness Based on Prompt Learning and Pre-Trained Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2024

Attention-Constrained Inference For Robust Decoder-Only Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

On the Effectiveness of Acoustic BPE in Decoder-Only TTS.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Acoustic BPE for Speech Generation with Discrete Tokens.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations.

[BibT_eX]

[DOI]

CoRR, 2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Joint Node Representation Learning and Clustering for Attributed Graph via Graph Diffusion Convolution.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2023

DiffVoice: Text-to-Speech with Latent Diffusion.

[BibT_eX]

[DOI]

Zhijun Liu

Yiwei Guo

Kai Yu

Proceedings of the IEEE International Conference on Acoustics, 2023

Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Spatio-Temporal Dynamics of Entropy in EEGS during Music Stimulation of Alzheimer's Disease Patients with Different Degrees of Dementia.

[BibT_eX]

[DOI]

Entropy, 2022

BiasedWalk: Learning Global-aware Node Embeddings via Biased Sampling.

[BibT_eX]

[DOI]

Zhengrong Xue

Ziao Guo

Yiwei Guo

CoRR, 2022

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis.

[BibT_eX]

[DOI]

Yiwei Guo

Chenpeng Du

Kai Yu

Proceedings of the IEEE International Conference on Acoustics, 2022

2020

A Reinforcement Learning Approach to Train Timetabling for Inter-City High Speed Railway Lines.

[BibT_eX]

[DOI]

Yiwei Guo

Proceedings of the 5th IEEE International Conference on Intelligent Transportation Engineering, 2020

Yiwei Guo

Bibliography

Loading...