Yan Song

Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

2022

Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition.

[BibT_eX]

[DOI]

Circuits Syst. Signal Process., 2022

Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Frontend Attributes Disentanglement for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Domain Robust Deep Embedding Learning for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Variance Normalised Features for Language and Dialect Discrimination.

[BibT_eX]

[DOI]

Xiaoxiao Miao

Circuits Syst. Signal Process., 2021

XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

An Improved Mean Teacher Based Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

A Novel Fault Diagnosis Method Based on Topological Data Analysis.

[BibT_eX]

[DOI]

Proceedings of the CAA Symposium on Fault Detection, 2021

2020

Segment boundary detection directed attention for online end-to-end speech recognition.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2020

Time-Frequency Feature Fusion for Noise Robust Audio Event Classification.

[BibT_eX]

[DOI]

Ramaswamy Palaniappan

Circuits Syst. Signal Process., 2020

Effective Exploitation of Posterior Information for Attention-Based Speech Recognition.

[BibT_eX]

[DOI]

IEEE Access, 2020

An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Listening and Grouping: An Online Autoregressive Approach for Monaural Speech Separation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

An Effective Deep Embedding Learning Architecture for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Topic Detection in Conversational Telephone Speech Using CNN with Multi-stream Inputs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Knowledge Distillation from Multilingual and Monolingual Teachers for End-to-End Multilingual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Learning Adaptive Downsampling Encoding for Online End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Triplet-Center Loss Based Deep Embedding Learning Method for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

LID-Senones and Their Statistics for Language Identification.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

A Conditional Generative Model for Speech Enhancement.

[BibT_eX]

[DOI]

Circuits Syst. Signal Process., 2018

Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Early Detection of Continuous and Partial Audio Events Using CNN.

[BibT_eX]

[DOI]

Lam Dang Pham

Ramaswamy Palaniappan

Huy Phan

Yue Lang

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

An Improved Deep Embedding Learning Method for Short Duration Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Source-Aware Context Network for Single-Channel Multi-Speaker Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Capsule based Approach for Polyphonic Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Fisher vector based CNN architecture for image classification.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Tibetan-Mandarin bilingual speech recognition based on end-to-end framework.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Topic classification based on distributed document representation and latent topic information.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features.

[BibT_eX]

[DOI]

Digit. Signal Process., 2016

Improved i-Vector Representation for Speaker Diarization.

[BibT_eX]

[DOI]

Circuits Syst. Signal Process., 2016

Image classification with CNN-based Fisher vector coding.

[BibT_eX]

[DOI]

Proceedings of the 2016 Visual Communications and Image Processing, 2016

Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Robust Sound Event Detection in Continuous Audio Environments.

[BibT_eX]

[DOI]

Haomin Zhang

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Compact convolutional neural network transfer learning for small-scale image classification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Robust Sound Event Classification Using Deep Neural Networks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2015

Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation.

[BibT_eX]

[DOI]

Hamid Reza Sharifzadeh

Su-Lim Tan

Jingjie Li

ACM Trans. Access. Comput., 2015

Mouth State Detection From Low-Frequency Ultrasonic Reflection.

[BibT_eX]

[DOI]

Circuits Syst. Signal Process., 2015

Deep Bottleneck Feature for Image Classification.

[BibT_eX]

[DOI]

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Deep bottleneck network based i-vector representation for language identification.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Low frequency ultrasonic voice activity detection using convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Robust sound event recognition using convolutional neural networks.

[BibT_eX]

[DOI]

Haomin Zhang

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improved language identification using deep bottleneck network.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Tone confusion in spoken and whispered Mandarin Chinese.

[BibT_eX]

[DOI]

Yan Xu

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Reconstruction of pitch for whisper-to-speech conversion of Chinese.

[BibT_eX]

[DOI]

Jingjie Li

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Performance evaluation of deep bottleneck features for spoken language identification.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Task-aware deep bottleneck features for spoken language identification.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A spectral based visual matching method for image classification.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Audio, 2014

2013

Reconstruction of continuous voiced speech from whispers.

[BibT_eX]

[DOI]

Jingjie Li

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Joint spectral distribution modeling using restricted boltzmann machines for voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Exemplar based language recognition method for short-duration speech segments.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Phoneme variation based synthesized speech discrimination for speaker verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Intra-conversation intra-speaker variability compensation for speaker clustering.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Exemplar-Based Sparse Representation for Language Recognition on I-Vectors.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

Spatial pooling for transformation invariant image representation.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Effective image representation based on bi-layer visual codebook.

[BibT_eX]

[DOI]

Proceedings of the First Asian Conference on Pattern Recognition, 2011

2010

The description of iFlyTek Speech Lab system for NIST2009 Language Recognition Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Multiple instance learning using visual phrases for object classification.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

2009

Unified Video Annotation via Multigraph Learning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2009

Semi-supervised kernel density estimation for video annotation.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2009

Concept-Dependent Image Annotation via Existence-Based Multiple-Instance Learning.

[BibT_eX]

[DOI]

Xun Yuan

Meng Wang

Proceedings of the IEEE International Conference on Systems, 2009

Image Fusion Quality Metrics by Directional Projection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, 2009

Concept representation based video indexing.

[BibT_eX]

[DOI]

Meng Wang

Xian-Sheng Hua

Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009

An automatic language identification method based on subspace analysis.

[BibT_eX]

[DOI]

Ren-Hua Wang

Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

2008

Video Annotation Based on Kernel Linear Neighborhood Propagation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2008

Optimizing Training Set Construction for Video Semantic Classification.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2008

A Sample and Feature Selection Scheme for GMM-SVM Based Language Recognition.

[BibT_eX]

[DOI]

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

The Adaptation Schemes In PR-SVM Based Language Recognition.

[BibT_eX]

[DOI]

Xu Bing