Xiang Yin

Orcid: 0000-0003-1324-4277

Affiliations:

ByteDance AI Lab, China

According to our database¹, Xiang Yin authored at least 57 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Dynamic Diffusion Graph Convolutional Network With Scene-Guided Gating for Trajectory Prediction in IoT-Based Intelligent Transportation Systems.

[BibT_eX]

[DOI]

IEEE Internet Things J., 2026

2025

InfinityHuman: Towards Long-Term Audio-Driven Human.

[BibT_eX]

[DOI]

CoRR, August, 2025

Multi-dynamic residual graph convolutional network with global feature enhancement for traffic flow prediction.

[BibT_eX]

[DOI]

Int. J. Mach. Learn. Cybern., February, 2025

Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, February, 2025

HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation.

[BibT_eX]

[DOI]

CoRR, February, 2025

Adaptive lightweight temporal convolutional network with context-aware downsampling strategy for traffic flow prediction.

[BibT_eX]

[DOI]

Eng. Appl. Artif. Intell., 2025

Interpretable accident prediction at highway-rail grade crossings: a deep learning approach.

[BibT_eX]

[DOI]

Xiang Yin

Jiangang Jin

Zhipeng Zhang

Comput. Ind. Eng., 2025

UniTalker: Conversational Speech-Visual Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

MSGCN-ISTL: A multi-scaled self-attention-enhanced graph convolutional network with improved STL decomposition for probabilistic load forecasting.

[BibT_eX]

[DOI]

Expert Syst. Appl., March, 2024

RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes.

[BibT_eX]

[DOI]

CoRR, 2024

MulliVC: Multi-lingual Voice Conversion With Cycle Consistency.

[BibT_eX]

[DOI]

CoRR, 2024

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Generative Expressive Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Static-dynamic collaborative graph convolutional network with meta-learning for node-level traffic flow prediction.

[BibT_eX]

[DOI]

Xiang Yin

Wenyu Zhang

Xin Jing

Expert Syst. Appl., October, 2023

Spatiotemporal dynamic graph convolutional network for traffic speed forecasting.

[BibT_eX]

[DOI]

Xiang Yin

Wenyu Zhang

Shuai Zhang

Inf. Sci., September, 2023

EnchantDance: Unveiling the Potential of Music-Driven Dance Movement.

[BibT_eX]

[DOI]

CoRR, 2023

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model.

[BibT_eX]

[DOI]

CoRR, 2023

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts.

[BibT_eX]

[DOI]

CoRR, 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias.

[BibT_eX]

[DOI]

CoRR, 2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

Detector Guidance for Multi-Object Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions and Prospects.

[BibT_eX]

[DOI]

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Emotionally Situated Text-to-Speech Synthesis in User-Agent Conversation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

S2CD: Self-heuristic Speaker Content Disentanglement for Any-to-Any Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AudioQR: Deep Neural Audio Watermarks For QR Code.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Virtual Try-On with Pose-Garment Keypoints Guided Inpainting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

LiteG2P: A Fast, Light and High Accuracy Model for Grapheme-to-Phoneme Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

UniLG: A Unified Structure-aware Framework for Lyrics Generation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features.

[BibT_eX]

[DOI]

CoRR, 2022

Unsupervised Video Domain Adaptation: A Disentanglement Perspective.

[BibT_eX]

[DOI]

CoRR, 2022

A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2022

Towards high-fidelity singing voice conversion with acoustic reference and contrastive predictive coding.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

An Automatic Soundtracking System for Text-to-Speech Audiobooks.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Using Clothes Style Transfer for Scenario-Aware Person Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech.

[BibT_eX]

[DOI]

CoRR, 2021

Towards Realistic Visual Dubbing with Heterogeneous Sources.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Fine-Grained Prosody Modeling in Neural Speech Synthesis Using ToBI Representation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Chapter-Wise Understanding System for Text-To-Speech in Chinese Novels.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

PPG-Based Singing Voice Conversion with Adversarial Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech.

[BibT_eX]

[DOI]

CoRR, 2020

A Hybrid Text Normalization System Using Multi-Head Self-Attention For Mandarin.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Unified Sequence-to-Sequence Front-End Model for Mandarin Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Xiaomingbot: A Multilingual Robot News Reporter.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020

Xiang Yin

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...