Wei Xue

Orcid: 0000-0002-4942-7748

Affiliations:
  • Hong Kong Baptist University, Division of Emerging Interdisciplinary Areas, Hong Kong
  • Imperial College London, Department of Electrical and Electronic Engineering, UK
  • Chinese Academy of Sciences (CAS), Pattern Recognition and Intelligent Systems from the Institute of Automation, Beijing, China (PhD 2015)


According to our database1, Wei Xue authored at least 92 papers between 2016 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
CoCoGesture: Towards coherent co-speech 3D gesture generation in the wild.
Inf. Fusion, 2026

2025
Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis.
CoRR, August, 2025

Inference-time Scaling for Diffusion-based Audio Super-resolution.
CoRR, August, 2025

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing.
CoRR, June, 2025

Graceful Forgetting in Generative Language Models.
CoRR, May, 2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix.
CoRR, May, 2025

J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge.
CoRR, May, 2025

CMD: Controllable Multiview Diffusion for 3D Editing and Progressive Generation.
CoRR, May, 2025

Co<sup>3</sup>Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion.
CoRR, May, 2025

OmniAudio: Generating Spatial Audio from 360-Degree Video.
CoRR, April, 2025

AudioX: Diffusion Transformer for Anything-to-Audio Generation.
CoRR, March, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.
CoRR, March, 2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens.
CoRR, March, 2025

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement.
CoRR, March, 2025

Delta Decompression for MoE-based LLMs Compression.
CoRR, February, 2025

Audio-FLAN: A Preliminary Release.
CoRR, February, 2025

VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer.
CoRR, February, 2025

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis.
CoRR, February, 2025

Every Angle Is Worth A Second Glance: Mining Kinematic Skeletal Structures from Multi-view Joint Cloud.
CoRR, February, 2025

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Co3Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

BayesKD: Bayesian Knowledge Distillation for Compact LLMs in Constrained Fine-tuning Scenarios.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Importance Weighting Can Help Large Language Models Self-Improve.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Deep Cross-Modal Retrieval Between Spatial Image and Acoustic Speech.
IEEE Trans. Multim., 2024

SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model.
CoRR, 2024

Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency.
CoRR, 2024

pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues.
CoRR, 2024

EVA: An Embodied World Model for Future Video Anticipation.
CoRR, 2024

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation.
CoRR, 2024

You Know What I'm Saying: Jailbreak Attack via Implicit Reference.
CoRR, 2024

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion.
CoRR, 2024

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts.
CoRR, 2024

AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems.
CoRR, 2024

NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models.
CoRR, 2024

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs.
CoRR, 2024

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions.
CoRR, 2024

M-LRM: Multi-view Large Reconstruction Model.
CoRR, 2024

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling.
CoRR, 2024

LLMs Meet Multimodal Generation and Editing: A Survey.
CoRR, 2024

MuPT: A Generative Symbolic Music Pretrained Transformer.
CoRR, 2024

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation.
CoRR, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.
CoRR, 2024

Dirichlet Continual Learning: Tackling Catastrophic Forgetting in NLP.
Proceedings of the Uncertainty in Artificial Intelligence, 2024

Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Can LLMs "Reason" in Music? an Evaluation of LLMs' Capability of Music Understanding and Generation.
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

ComposerX: Multi-Agent Symbolic Music Composition With LLMs.
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

COMOSVC: Consistency Model-Based Singing Voice Conversion.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DetKDS: Knowledge Distillation Search for Object Detectors.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

PyramidCodec: Hierarchical Codec for Long-form Music Generation in Audio Domain.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

AttnZero: Efficient Attention Discovery for Vision Transformers.
Proceedings of the Computer Vision - ECCV 2024, 2024

Auto-GAS: Automated Proxy Discovery for Training-Free Generative Architecture Search.
Proceedings of the Computer Vision - ECCV 2024, 2024

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-Speech Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024


FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection.
CoRR, 2023

Continual Learning with Dirichlet Generative-based Rehearsal.
CoRR, 2023

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation.
CoRR, 2023

Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings.
CoRR, 2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

LyricWhiz: Robust Multilingual Zero-Shot Lyrics Transcription by Whispering to ChatGPT.
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

GCC-Speaker: Target Speaker Localization with Optimal Speaker-Dependent Weighting in Multi-Speaker Scenarios.
Proceedings of the IEEE International Conference on Acoustics, 2023

MoMusic: A Motion-Driven Human-AI Collaborative Music Composition and Performing System.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Deep Audio-Visual Beamforming for Speaker Localization.
IEEE Signal Process. Lett., 2022

Pathway to Future Symbiotic Creativity.
CoRR, 2022

2021
Speech Enhancement Based on Modulation-Domain Parametric Multichannel Kalman Filtering.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Neural Kalman Filtering for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2021

Causal System Identification based Compensation for Reverberation-Robust DOA Estimation.
Proceedings of the 29th European Signal Processing Conference, 2021

2020
Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The JD AI Speaker Verification System for the FFSVC 2020 Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
Noise Covariance Matrix Estimation for Rotating Microphone Arrays.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018
Modulation-Domain Multichannel Kalman Filtering for Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Binaural Mask-Informed Speech Enhancement for Hearing AIDS with Head Tracking.
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

Multichannel Kalman Filtering for Speech Ehnancement.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Modulation-Domain Parametric Multichannel Kalman Filtering for Speech Enhancement.
Proceedings of the 26th European Signal Processing Conference, 2018

Estimation of the Noise Covariance Matrix for Rotating Sensor Arrays.
Proceedings of the 52nd Asilomar Conference on Signals, Systems, and Computers, 2018

2017
Frequency-domain under-modelled blind system identification based on cross power spectrum and sparsity regularization.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
Under-modelled blind system identification for time delay estimation in reverberant environments.
Proceedings of the IEEE International Workshop on Acoustic Signal Enhancement, 2016

Cross-correlation based under-modelled multichannel blind acoustic system identification with sparsity regularization.
Proceedings of the 24th European Signal Processing Conference, 2016


  Loading...