Lei Xie

Neural Networks, January, 2023

Look&listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Persons.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Timbre-Reserved Adversarial Attack in Speaker Identification.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

MSM-VC: High-Fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-Scale Style Modeling.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

LM-VC: Zero-Shot Voice Conversion via Speech Generation Based on Language Models.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2023

Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation.

[BibT_eX]

[DOI]

CoRR, 2023

Vec-Tok Speech: speech vectorization and tokenization for neural speech generation.

[BibT_eX]

[DOI]

CoRR, 2023

U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning.

[BibT_eX]

[DOI]

CoRR, 2023

SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2023

PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts.

[BibT_eX]

[DOI]

CoRR, 2023

Timbre-reserved Adversarial Attack in Speaker Identification.

[BibT_eX]

[DOI]

CoRR, 2023

The FlySpeech Audio-Visual Speaker Diarization System for MISP Challenge 2022.

[BibT_eX]

[DOI]

CoRR, 2023

Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification.

[BibT_eX]

[DOI]

CoRR, 2023

StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation.

[BibT_eX]

[DOI]

CoRR, 2023

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR.

[BibT_eX]

[DOI]

CoRR, 2023

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network.

[BibT_eX]

[DOI]

CoRR, 2023

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2023

Distance-based Weight Transfer from Near-field to Far-field Speaker Verification.

[BibT_eX]

[DOI]

CoRR, 2023

Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer.

[BibT_eX]

[DOI]

CoRR, 2023

The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Distance-Based Weight Transfer for Fine-Tuning From Near-Field to Far-Field Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Distinguishable Speaker Anonymization Based on Formant and Fundamental Frequency Scaling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Preserving Background Sound in Noise-Robust Voice Conversion Via Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

The NPU-Elevoc Personalized Speech Enhancement System for Icassp2023 DNS Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Wekws: A Production First Small-Footprint End-to-End Keyword Spotting Toolkit.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Delivering Speaking Style in Low-Resource Voice Conversion with Multi-Factor Constraints.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

DSPGAN: A Gan-Based Universal Vocoder for High-Fidelity TTS by Time-Frequency Domain Supervision from DSP.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Two-Stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

An Exploration of Task-Decoupling on Two-Stage Neural Post Filter for Real-Time Personalized Acoustic Echo Cancellation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Promptspeaker: Speaker Generation Based on Text Descriptions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Vits-Based Singing Voice Conversion Leveraging Whisper and Multi-Scale F0 Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Salt: Distinguishable Speaker Anonymization Through Latent Space Transformation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Disentangling Style and Speaker Attributes for TTS Style Transfer.

[BibT_eX]

[DOI]

Xiaochun An

Frank K. Soong

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing.

[BibT_eX]

[DOI]

Chenggang Mi

Yanning Zhang

Neural Networks, 2022

Two-stage streaming keyword detection and localization with multi-scale depthwise temporal convolution.

[BibT_eX]

[DOI]

Jingyong Hou

Shilei Zhang

Neural Networks, 2022

Noise-robust voice conversion with domain adversarial training.

[BibT_eX]

[DOI]

Hongqiang Du

Haizhou Li

Neural Networks, 2022

MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages.

[BibT_eX]

[DOI]

CoRR, 2022

TESSP: Text-Enhanced Self-Supervised Speech Pre-training.

[BibT_eX]

[DOI]

CoRR, 2022

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer.

[BibT_eX]

[DOI]

CoRR, 2022

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario.

[BibT_eX]

[DOI]

CoRR, 2022

NWPU-ASLP System for the VoicePrivacy 2022 Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion.

[BibT_eX]

[DOI]

CoRR, 2022

MFCCA:Multi-Frame Cross-Channel Attention for Multi-Speaker ASR in Multi-Party Meeting Scenario.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

End-to-End Voice Conversion with Information Perturbation.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Multi-speaker Multi-style Text-to-speech Synthesis with Single-speaker Single-style Training Data Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Personalized Acoustic Echo Cancellation for Full-duplex Communications.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Backend Ensemble for Speaker Verification and Spoofing Countermeasure.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Minimizing Sequential Confusion Error in Speech Command Recognition.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2022, 2022

Multi-Task Deep Residual Echo Suppression with Echo-Aware Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Conversational Speech Recognition by Learning Conversation-Level Characteristics.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

One-Shot Voice Conversion For Style Transfer Based On Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

S-DCCRN: Super Wide Band DCCRN with Learnable Complex Feature for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

TEA-PSE: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System for ICASSP 2022 DNS Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Uformer: A Unet Based Dilated Complex & Real Dual-Path Conformer Network for Simultaneous Speech Enhancement and Dereverberation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

Factorized WaveNet for voice conversion with limited data.

[BibT_eX]

[DOI]

Speech Commun., 2021

Cycle consistent network for end-to-end style transfer TTS training.

[BibT_eX]

[DOI]

Neural Networks, 2021

Effective and direct control of neural TTS prosody by removing interactions between different attributes.

[BibT_eX]

[DOI]

Neural Networks, 2021

Controllable cross-speaker emotion transfer for end-to-end speech synthesis.

[BibT_eX]

[DOI]

CoRR, 2021

AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person.

[BibT_eX]

[DOI]

CoRR, 2021

Improving robustness of one-shot voice conversion with deep discriminative speaker encoder.

[BibT_eX]

[DOI]

Hongqiang Du

CoRR, 2021

Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-EndSpeech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing.

[BibT_eX]

[DOI]

CoRR, 2021

The NPU System for the 2020 Personalized Voice Trigger Challenge.

[BibT_eX]

[DOI]

CoRR, 2021

WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

CoRR, 2021

The SLT 2021 Children Speech Recognition Challenge: Open Datasets, Rules and Baselines.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Multi-Band Melgan: Faster Waveform Generation For High-Quality Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Learn2Sing: Target Speaker Singing Voice Synthesis by Learning from a Singing Teacher.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Cascade RNN-Transducer: Syllable Based Streaming On-Device Mandarin Speech Recognition with a Syllable-To-Character Converter.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Fine-Grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis.

[BibT_eX]

[DOI]

Yi Lei

Shan Yang

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Multi-Channel Automatic Speech Recognition Using Deep Complex Unet.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Conversational End-to-End TTS for Voice Agents.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

IEEE SLT 2021 Alpha-Mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

DESNet: A Multi-Channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Optimizing Voice Conversion Network with Cycle Consistency Loss of Speaker Identity.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Context-aware RNNLM Rescoring for Conversational Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Adversarial Training for Multi-domain Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Accent and Speaker Disentanglement in Many-to-many Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Controllable Emotion Transfer For End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

F-T-LSTM Based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Enriching Source Style Transfer in Recognition-Synthesis Based Non-Parallel Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

DCCRN+: Channel-Wise Subband DCCRN with SNR Estimation for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Glow-WaveGAN: Learning Speech Representations from GAN-Based Variational Auto-Encoder for High Fidelity Flow-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Controllable Context-Aware Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS.

[BibT_eX]

[DOI]

Xiaochun An

Frank K. Soong

Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Noise Robust Singing Voice Synthesis Using Gaussian Mixture Variational Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021

Efficient Gradient-Based Neural Architecture Search For End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021

ASMMC21: The 6th International Workshop on Affective Social Multimedia Computing.

[BibT_eX]

[DOI]

Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

A Web-Based Longitudinal Mental Health Monitoring System.

[BibT_eX]

[DOI]

Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021

TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN.

[BibT_eX]

[DOI]

Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021

The Multi-Speaker Multi-Style Voice Cloning Challenge 2021.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Wake Word Detection with Streaming Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Duality Temporal-Channel-Frequency Attention Enhanced Speaker Representation Learning.

[BibT_eX]

[DOI]

Li Zhang

Qing Wang

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Boundary and Context Aware Training for CIF-Based Non-Autoregressive End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Improving Adversarial Neural Machine Translation for Morphologically Rich Language.

[BibT_eX]

[DOI]

Chenggang Mi

Yanning Zhang

IEEE Trans. Emerg. Top. Comput. Intell., 2020

Fast Query-by-Example Speech Search Using Attention-Based Deep Binary Embeddings.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Loanword Identification in Low-Resource Languages with Minimal Supervision.

[BibT_eX]

[DOI]

Chenggang Mi

Yanning Zhang

ACM Trans. Asian Low Resour. Lang. Inf. Process., 2020

Adversarial Feature Learning and Unsupervised Clustering Based Speech Synthesis for Found Data With Acoustic and Textual Noise.

[BibT_eX]

[DOI]

Shan Yang

Yuxuan Wang

IEEE Signal Process. Lett., 2020

On the localness modeling for the self-attention based end-to-end speech synthesis.

[BibT_eX]

[DOI]

Neural Networks, 2020

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.

[BibT_eX]

[DOI]

CoRR, 2020

Conversational End-to-End TTS for Voice Agent.

[BibT_eX]

[DOI]

CoRR, 2020

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2020, 2020

Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2020, 2020

Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition.

[BibT_eX]

[DOI]

Qing Wang

Pengcheng Guo

Proceedings of the Interspeech 2020, 2020

Wake Word Detection with Alignment-Free Lattice-Free MMI.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2020, 2020

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2020, 2020

Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2020, 2020

Mining Effective Negative Training Samples for Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Time-Domain Neural Network Approach for Speech Bandwidth Extension.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Effective Wavenet Adaptation for Voice Conversion with Limited Data.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Region Proposal Network Based Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2019

Pre-Alignment Guided Attention for Improving Training Efficiency and Model Stability in End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Access, 2019

Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings With Temporal Context.

[BibT_eX]

[DOI]

IEEE Access, 2019

Towards Language-Universal Mandarin-English Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2019, 2019

Building a Mixed-Lingual Neural TTS System with Only Monolingual Data.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2019, 2019

Improved Speaker-Dependent Separation for CHiME-5 Challenge.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2019, 2019

Adversarial Regularization for End-to-End Robust Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2019, 2019

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2019, 2019

A New GAN-Based End-to-End TTS Training Algorithm.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2019, 2019

Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition.

[BibT_eX]

[DOI]

Pengcheng Guo

Sining Sun

Proceedings of the Interspeech 2019, 2019

Deep Audio-visual System for Closed-set Word-level Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2019

Robust Audio-visual Speech Recognition Using Bimodal Dfsmn with Multi-condition Training and Dropout Regularization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

A Pitch-aware Approach to Single-channel Speech Separation.

[BibT_eX]

[DOI]

Ke Wang

Frank K. Soong

Proceedings of the IEEE International Conference on Acoustics, 2019

Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Investigating End-to-end Speech Recognition for Mandarin-english Code-switching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

An Attention-based Neural Network Approach for Single Channel Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Controlling Emotion Strength with Relative Attribute for End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Verifying Deep Keyword Spotting Detection with Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Improving Mandarin End-to-End Speech Synthesis by Self-Attention and Learnable Gaussian Bias.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Time Domain Audio Visual Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

Xiong Wang

Sining Sun

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

WaveNet Factorization with Singular Value Decomposition for Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Incremental Lattice Determinization for WFST Decoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Exploring RNN-Transducer for Chinese speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Multiple fixed beamformers with a spacial Wiener-form postfilter for far-field speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection.

[BibT_eX]

[DOI]

Chenglin Xu

Xiong Xiao

J. Signal Process. Syst., 2018

Guest Editorial: Advances in Deep Learning for Speech Processing.

[BibT_eX]

[DOI]

Tan Lee

Man-Wai Mak

J. Signal Process. Syst., 2018

Learning distributed sentence representations for story segmentation.

[BibT_eX]

[DOI]

Signal Process., 2018

Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation.

[BibT_eX]

[DOI]

Neurocomputing, 2018

ASMMC-MMAC 2018: The Joint Workshop of 4th the Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

A Refined Query-by-Example Approach to Spoken-Term-Detection on ESL learners' Speech.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2018, 2018

Empirical Evaluation of Speaker Adaptation on DNN Based Acoustic Model.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2018, 2018

Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2018, 2018

Training Augmentation with Adversarial Examples for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2018, 2018

Attention-based End-to-End Models for Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2018, 2018

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2018, 2018

Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Domain Adversarial Training for Accented Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Attention-Based End-to-End Speech Recognition on Voice Search.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Self-validated Story Segmentation of Chinese Broadcast News.

[BibT_eX]

[DOI]

Proceedings of the Advances in Brain Inspired Cognitive Systems, 2018

2017

Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Multitask Feature Learning for Low-Resource Query-by-Example Spoken Term Detection.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2017

Online object tracking based on BLSTM-RNN with contextual-sequential labeling.

[BibT_eX]

[DOI]

J. Ambient Intell. Humaniz. Comput., 2017

A hybrid neural network hidden Markov model approach for automatic story segmentation.

[BibT_eX]

[DOI]

J. Ambient Intell. Humaniz. Comput., 2017

Media computing and applications for immersive communications: recent advances.

[BibT_eX]

[DOI]

Janne Heikkilä

Bo Li

J. Ambient Intell. Humaniz. Comput., 2017

An unsupervised deep domain adaptation approach for robust speech recognition.

[BibT_eX]

[DOI]

Neurocomputing, 2017

Sound image externalization for headphone based real-time 3D audio.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2017

Introduction to special section on advances of orange technologies.

[BibT_eX]

[DOI]

Jhing-Fa Wang

Frontiers Comput. Sci., 2017

Attention-Based End-to-End Speech Recognition in Mandarin.

[BibT_eX]

[DOI]

CoRR, 2017

Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2017, 2017

Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2017, 2017

Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Multilingual bottle-neck feature learning from untranscribed speech.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Topic embedding of sentences for story segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

An end-to-end neural network approach to story segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

A segmental DNN/i-vector approach for digit-prompted speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Frequency-invariant differential microphone array design in the STFT domain.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Real-time tracking-by-learning with high-order regularization fusion for big video abstraction.

[BibT_eX]

[DOI]

Signal Process., 2016

Guest Editorial: Immersive Audio/Visual Systems.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2016

A deep bidirectional LSTM approach for video-realistic talking head.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2016

Deformable object tracking with spatiotemporal segmentation in big vision surveillance.

[BibT_eX]

[DOI]

Neurocomputing, 2016

On the impact of phoneme alignment in DNN-based speech synthesis.

[BibT_eX]

[DOI]

Mei Li

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

The NNI Vietnamese Speech Recognition System for MediaEval 2016.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

Investigating neural network based query-by-example keyword spotting approach for personalized wake-up word detection in Mandarin Chinese.

[BibT_eX]

[DOI]

Jingyong Hou

Zhonghua Fu

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair Information.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2016, 2016

A DNN-HMM Approach to Story Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2016, 2016

Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2016, 2016

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2016, 2016

Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2016, 2016

Deep neural network derived bottleneck features for accurate audio classification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016

Approximate search of audio queries by using DTW with phone time boundary and data augmentation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exemplar-based sparse representation of timbre and prosody for voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

On the training of DNN-based average voice model for speech synthesis.

[BibT_eX]

[DOI]

Shan Yang

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

On the use of I-vectors and average voice model for voice conversion without parallel data.

[BibT_eX]

[DOI]

Jie Wu

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Predicting articulatory movement from text using deep architecture with stacked bottleneck features.

[BibT_eX]

[DOI]

Zhen Wei

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Study on near-field crosstalk cancellation based on least square algorithm.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015

Tennis Ball Tracking Using a Two-Layered Data Association Approach.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2015

Multiple pedestrian tracking based on couple-states Markov chain with semantic topic learning for video surveillance.

[BibT_eX]

[DOI]

Soft Comput., 2015

Topic modeling in multimedia: algorithms and applications.

[BibT_eX]

[DOI]

Soft Comput., 2015

NestDE: generic parameters tuning for automatic story segmentation.

[BibT_eX]

[DOI]

Soft Comput., 2015

Topic segmentation on spoken documents using self-validated acoustic cuts.

[BibT_eX]

[DOI]

Soft Comput., 2015

Expressive talking avatar synthesis and animation.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2015

Head motion synthesis from speech using deep neural networks.

[BibT_eX]

[DOI]

Chuang Ding

Pengcheng Zhu

Multim. Tools Appl., 2015

Online Object Tracking Based on CNN with Metropolis-Hasting Re-Sampling.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

The NNI Query-by-Example System for MediaEval 2015.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings.

[BibT_eX]

[DOI]

Pengcheng Zhu

Yunlin Chen

Proceedings of the INTERSPEECH 2015, 2015

Regularized non-negative matrix factorization using alternating direction method of multipliers and its application to source separation.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2015, 2015

An alternating optimization approach for phase retrieval.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2015, 2015

BLSTM neural networks for speech driven head motion synthesis.

[BibT_eX]

[DOI]

Chuang Ding

Pengcheng Zhu

Proceedings of the INTERSPEECH 2015, 2015

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2015, 2015

Language independent query-by-example spoken term detection using N-best phone sequences and partial matching.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Photo-real talking head with deep bidirectional LSTM.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Non-negative matrix factorization using stable alternating direction method of multipliers for source separation.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

A density peak clustering approach to unsupervised acoustic subword units discovery.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

A waveform representation framework for high-quality statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Fundamental frequency modeling using wavelets for emotional voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014

A statistical parametric approach to video-realistic text-driven talking avatar.

[BibT_eX]

[DOI]

Naicai Sun

Bo Fan

Multim. Tools Appl., 2014

Multimodal joint information processing in human machine interaction: recent advances.

[BibT_eX]

[DOI]

Zhigang Deng

Stephen J. Cox

Multim. Tools Appl., 2014

The NNI Query-by-Example System for MediaEval 2014.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

A hybrid virtual bass system with improved phase vocoder and high efficiency.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Experimental study on dereverberation and noise reduction for distant speech recognition.

[BibT_eX]

[DOI]

Hang Lv

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2014, 2014

A deep neural network approach for sentence boundary detection in broadcast news.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2014, 2014

Stereo acoustic echo suppression using widely linear filtering in the frequency domain.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2014, 2014

Speech-driven head motion synthesis using neural networks.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2014, 2014

An ensemble of deep neural networks for object tracking.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Unsupervised broadcast news story segmentation using distance dependent Chinese restaurant processes.

[BibT_eX]

[DOI]

Chao Yang

Xiangzeng Zhou

Proceedings of the IEEE International Conference on Acoustics, 2014

Sentence boundary detection in chinese broadcast news using conditional random fields and prosodic features.

[BibT_eX]

[DOI]

Chenglin Xu

Zhonghua Fu

Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

Learning optimal features for music transcription.

[BibT_eX]

[DOI]

Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

Multimodal continuous affect recognition based on LSTM and multiple kernel learning.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013

A two layered data association approach for ball tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

A tighter lower bound estimate for dynamic time warping.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Measuring semantic similarity by contextualword connections in Chinese news story segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Broadcast news story segmentation using latent topics on data manifold.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Numerical calculation of the head-related transfer functions with Chinese dummy head.

[BibT_eX]

[DOI]

Ling Tang

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Context-dependent deep neural networks for commercial Mandarin speech recognition applications.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012

Laplacian Eigenmaps for Automatic Story Segmentation of Broadcast News.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2012

Mask Estimation and Refinement for MFT-based Robust Speaker Verification.

[BibT_eX]

[DOI]

Yali Zhao

Zhonghua Fu

Proceedings of the INTERSPEECH 2012, 2012

Speech Pattern Discovery using Audio-Visual Fusion and Canonical Correlation Analysis.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2012, 2012

Lexical Story Co-Segmentation of Chinese Broadcast News.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2012, 2012

Acoustic TextTiling for story segmentation of spoken documents.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Detection of ball hits in a tennis game using audio and visual information.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news.

[BibT_eX]

[DOI]

Multim. Syst., 2011

On the effectiveness of subwords for lexical cohesion based story segmentation of Chinese broadcast news.

[BibT_eX]

[DOI]

Yulian Yang

Inf. Sci., 2011

Probabilistic Latent Semantic Analysis for Broadcast News Story Segmentation.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2011, 2011

2010

Cascade Markov random fields for stroke extraction of Chinese characters.

[BibT_eX]

[DOI]

Inf. Sci., 2010

Minimizing the expected complete influence time of a social network.

[BibT_eX]

[DOI]

Yaodong Ni

Inf. Sci., 2010

Speech and Auditory Interfaces for Ubiquitous, Immersive and Personalized Applications.

[BibT_eX]

[DOI]

Proceedings of the Symposia and Workshops on Ubiquitous, 2010

Multi-modal feature integration for story boundary detection in broadcast news.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Dual-microphone noise reduction based on semi-blind DUET.

[BibT_eX]

[DOI]

Dongmei Jiang

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Phoneme lattice based texttiling towards multilingual story segmentation.

[BibT_eX]

[DOI]

Proceedings of the INTERSPEECH 2010, 2010

Maximum lexical cohesion for fine-grained news story segmentation.

[BibT_eX]

[DOI]

Zihan Liu

Wei Feng

Proceedings of the INTERSPEECH 2010, 2010

2009

Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models.

[BibT_eX]

[DOI]

J. Vis. Lang. Comput., 2009

Noise robust features for speech/music discrimination in real-time telecommunication.

[BibT_eX]

[DOI]

Jhing-Fa Wang

Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

A Subword Normalized Cut Approach to Automatic Story Segmentation of Chinese Broadcast News.

[BibT_eX]

[DOI]

Proceedings of the Information Retrieval Technology, 2009

Multicue Graph Mincut for Image Segmentation.

[BibT_eX]

[DOI]

Wei Feng

Proceedings of the Computer Vision, 2009

2008

Type-2 fuzzy Gaussian mixture models.

[BibT_eX]

[DOI]

Pattern Recognit., 2008

Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News.

[BibT_eX]

[DOI]

Yulian Yang

Proceedings of the Advances in Multimedia Information Processing, 2008

Subword Latent Semantic Analysis for Texttiling-Based Automatic Story Segmentation of Chinese Broadcast News.

[BibT_eX]

[DOI]

Yulian Yang

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A Heuristic Approach to Caption Enhancement for Effective Video OCR.

[BibT_eX]

[DOI]

Xi Tan

Proceedings of the Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues, 2008

Multi-Scale TextTiling for Automatic Story Segmentation in Chinese Broadcast News.

[BibT_eX]

[DOI]

Wei Feng

Proceedings of the Information Retrieval Technology, 2008

2007

Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2007

A coupled HMM approach to video-realistic speech animation.

[BibT_eX]

[DOI]

Pattern Recognit., 2007

Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News.

[BibT_eX]

[DOI]

Chuan Liu

Helen Meng

Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation.

[BibT_eX]

[DOI]

Shing-kai Chan

Helen M. Meng

Proceedings of the INTERSPEECH 2007, 2007

2006

Lip Assistant: Visualize Speech for Hearing Impaired People in Multimedia Services.

[BibT_eX]

[DOI]

Yi Wang

Proceedings of the IEEE International Conference on Systems, 2006

The SOMN-HMM Model and Its Application to Automatic Synthesis of 3D Character Animations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, 2006

Supervised Learning of Motion Style for Real-time Synthesis of 3D Character Animations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, 2006

A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion.

[BibT_eX]

[DOI]

Helen Meng

Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Speech Animation Using Coupled Hidden Markov Models.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

An Articulatory Approach to Video-Realistic Mouth Animation.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition.

[BibT_eX]

[DOI]