A multiscale analysis-assisted two-stage reduced-order deep learning approach for effective thermal conductivity of arbitrary contrast heterogeneous materials.

[BibT_eX]

[DOI]

Zihao Yang

Xixin Wu

Xindang He

Xiaofei Guan

Eng. Appl. Artif. Intell., 2024

Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech.

[BibT_eX]

[DOI]

CoRR, 2024

AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions.

[BibT_eX]

[DOI]

CoRR, 2024

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Purple-teaming LLMs with Adversarial Defender Training.

[BibT_eX]

[DOI]

CoRR, 2024

Injecting Linguistic Knowledge Into BERT for Dialogue State Tracking.

[BibT_eX]

[DOI]

Xiaohan Feng

Xixin Wu

Helen Meng

IEEE Access, 2024

Addressing Index Collapse of Large-Codebook Speech Tokenizer With Dual-Decoding Product-Quantized Variational Auto-Encoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec For Efficient Language Model Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Rethinking Machine Ethics - Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

ERVQ: Leverage Residual Vector Quantization for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

CMAST: Efficient Speech-Text Joint Training Method to Enhance Linguistic Features Learning of Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Naturalistic Language-Related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Not All Errors Are Equal: Investigation of Speech Recognition Errors in Alzheimer's Disease Detection.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Prompting Large Language Models with Mispronunciation Detection and Diagnosis Abilities.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema.

[BibT_eX]

[DOI]

Xiaohan Feng

Xixin Wu

Helen Meng

Proceedings of the KDD Workshop on Human-Interpretable AI 2024 co-located with 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024), 2024

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Estimating the Uncertainty in Emotion Class Labels With Utterance-Specific Dirichlet Priors.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2023

SAIL: Search-Augmented Instruction Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Interpretable Unified Language Checking.

[BibT_eX]

[DOI]

CoRR, 2023

SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

VF-Taco2: Towards Fast and Lightweight Synthesis for Autoregressive Models with Variation Autoencoder and Feature Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer's Disease Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Search Augmented Instruction Learning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations.

[BibT_eX]

[DOI]

CoRR, 2022

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE.

[BibT_eX]

[DOI]

CoRR, 2022

Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Speech-Vision Based Multi-Modal AI Control of a Magnetic Anchored and Actuated Endoscope.

[BibT_eX]

[DOI]

David Navarro-Alarcon

Calvin Sze Hang Ng

Philip Wai Yan Chiu

Zheng Li

Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2022

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

HILvoice:Human-in-the-Loop Style Selection for Elder-Facing Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Spoofing-Aware Speaker Verification by Multi-Level Fusion.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Exploring linguistic feature and model combination for speech recognition based automatic AD detection.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Characterizing the Adversarial Vulnerability of Speech self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Neural Architecture Search for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout.

[BibT_eX]

[DOI]

Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022

2021

Speech Emotion Recognition Using Sequential Capsule Networks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Exemplar-Based Emotive Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Attention Forcing for Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2021

Should Ensemble Members Be Calibrated?

[BibT_eX]

[DOI]

Xixin Wu

Mark J. F. Gales

CoRR, 2021

Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Deliberation-Based Multi-Pass Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech.

[BibT_eX]

[DOI]

CoRR, 2020

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Speaker-Aware Linear Discriminant Analysis in Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Ensemble Approaches for Uncertainty in Spoken Language Assessment.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Accent Conversion Without Using Native Utterances.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Maximizing Mutual Information for Tacotron.

[BibT_eX]

[DOI]

CoRR, 2019

Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Unsupervised Methods for Audio Classification from Lecture Discussion Recordings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Recurrent Neural Network Language Model Training Using Natural Gradient.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Speech Emotion Recognition Using Capsule Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Code-switched TTS with Mix of Monolingual Recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Coupling Global and Local Context for Unsupervised Aspect Extraction.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018

The HCCL-CUHK System for the Voice Conversion Challenge 2018.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Speech Super-Resolution Using Parallel WaveNet.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Feature Based Adaptation for Speaking Style Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Intonation classification for L2 English speech using multi-distribution deep neural networks.

[BibT_eX]

[DOI]

Kun Li

Xixin Wu

Helen M. Meng

Comput. Speech Lang., 2017

2015

Acoustic to articulatory mapping with deep neural network.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2015

Understanding speaking styles of internet speech data with LSTM and low-resource training.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014

Automatic speech data clustering with human perception based weighted distance.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

2012

Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Xixin Wu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...