Wei Xue
Orcid: 0000-0002-4942-7748Affiliations:
- Hong Kong Baptist University, Division of Emerging Interdisciplinary Areas, Hong Kong
- Imperial College London, Department of Electrical and Electronic Engineering, UK
- Chinese Academy of Sciences (CAS), Pattern Recognition and Intelligent Systems from the Institute of Automation, Beijing, China (PhD 2015)
According to our database1,
Wei Xue authored at least 120 papers
between 2016 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
Int. J. Comput. Vis., April, 2026
Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling.
CoRR, April, 2026
CoRR, April, 2026
CoRR, April, 2026
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing.
CoRR, April, 2026
CoRR, March, 2026
DC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological Reasoning.
CoRR, March, 2026
CoRR, February, 2026
Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection.
CoRR, January, 2026
Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models.
CoRR, January, 2026
CoRR, January, 2026
Inf. Fusion, 2026
Lighthouse: A Self-Reconfiguring Sociotechnical Infrastructure for the Unforeseen Long-Tail of Urban Crisis.
Proceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems, 2026
WenetSpeech-Yue: A Large-Scale Cantonese Speech Corpus with Multi-dimensional Annotation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
2025
CoRR, December, 2025
PrismAudio: Decomposed Chain-of-Thoughts and Multi-dimensional Rewards for Video-to-Audio Generation.
CoRR, November, 2025
Every Angle is Worth a Second Glance: Mining Kinematic Skeletal Structures From Multi-View Joint Cloud.
IEEE Trans. Vis. Comput. Graph., October, 2025
CoRR, October, 2025
CoRR, September, 2025
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks.
CoRR, September, 2025
CoRR, September, 2025
CoRR, September, 2025
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing.
CoRR, June, 2025
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix.
CoRR, May, 2025
CoRR, May, 2025
Co<sup>3</sup>Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion.
CoRR, May, 2025
CoRR, March, 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens.
CoRR, March, 2025
VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer.
CoRR, February, 2025
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis.
CoRR, February, 2025
Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025
Proceedings of the 33rd ACM International Conference on Multimedia, 2025
Proceedings of the Forty-second International Conference on Machine Learning, 2025
MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition.
Proceedings of the Forty-second International Conference on Machine Learning, 2025
Proceedings of the Forty-second International Conference on Machine Learning, 2025
Proceedings of the Forty-second International Conference on Machine Learning, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Co3Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
BayesKD: Bayesian Knowledge Distillation for Compact LLMs in Constrained Fine-tuning Scenarios.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025
2024
IEEE Trans. Multim., 2024
SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model.
CoRR, 2024
Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency.
CoRR, 2024
CoRR, 2024
PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion.
CoRR, 2024
CoRR, 2024
AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Proceedings of the Uncertainty in Artificial Intelligence, 2024
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Can LLMs "Reason" in Music? an Evaluation of LLMs' Capability of Music Understanding and Generation.
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Auto-GAS: Automated Proxy Discovery for Training-Free Generative Architecture Search.
Proceedings of the Computer Vision - ECCV 2024, 2024
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-Speech Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection.
CoRR, 2023
CoRR, 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings.
CoRR, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
LyricWhiz: Robust Multilingual Zero-Shot Lyrics Transcription by Whispering to ChatGPT.
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
GCC-Speaker: Target Speaker Localization with Optimal Speaker-Dependent Weighting in Multi-Speaker Scenarios.
Proceedings of the IEEE International Conference on Acoustics, 2023
MoMusic: A Motion-Driven Human-AI Collaborative Music Composition and Performing System.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
IEEE Signal Process. Lett., 2022
2021
Speech Enhancement Based on Modulation-Domain Parametric Multichannel Kalman Filtering.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Causal System Identification based Compensation for Reverberation-Robust DOA Estimation.
Proceedings of the 29th European Signal Processing Conference, 2021
2020
Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
2019
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
2018
IEEE ACM Trans. Audio Speech Lang. Process., 2018
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 26th European Signal Processing Conference, 2018
Proceedings of the 52nd Asilomar Conference on Signals, Systems, and Computers, 2018
2017
Frequency-domain under-modelled blind system identification based on cross power spectrum and sparsity regularization.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
2016
Under-modelled blind system identification for time delay estimation in reverberant environments.
Proceedings of the IEEE International Workshop on Acoustic Signal Enhancement, 2016
Cross-correlation based under-modelled multichannel blind acoustic system identification with sparsity regularization.
Proceedings of the 24th European Signal Processing Conference, 2016