Jun Du

Orcid: 0000-0002-2387-0389

Affiliations:
  • University of Science and Technology of China, USTC, School of Information Science and Technology, Hefei, Anhui, China (PhD 2009)
  • Microsoft Research Asia, Department of handwriting recognition, OCR, China (2010-2013)
  • iFlytek Research, Department of speech recognition, China (2009-2010)
  • University of Science and Technology of China, China (PhD 2009)


According to our database1, Jun Du authored at least 332 papers between 2006 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Three-stage modular speaker diarization collaborating with front-end techniques in the CHiME-8 NOTSOFAR-1 challenge.
Comput. Speech Lang., 2026

2025
Unsupervised Low-Light Image Enhancement Based on Curve Estimation and Illumination Perception.
Signal Image Video Process., August, 2025

An Enhanced Audio Feature Tailored for Anomalous Sound Detection Based on Pre-trained Models.
CoRR, August, 2025

Cross-Modal Knowledge Distillation with Multi-Level Data Augmentation for Low-Resource Audio-Visual Sound Event Localization and Detection.
CoRR, August, 2025

READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation.
CoRR, August, 2025

Lightweight Audio-Visual Wake Word Spotting With Diverse Acoustic Knowledge Distillation.
IEEE Trans. Circuits Syst. Video Technol., July, 2025

Exploring Speaker Diarization with Mixture of Experts.
CoRR, June, 2025

HPCNet: Hybrid Pixel and Contour Network for Audio-Visual Speech Enhancement With Low-Quality Video.
IEEE J. Sel. Top. Signal Process., May, 2025

The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition.
CoRR, May, 2025

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge.
CoRR, May, 2025

Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration.
CoRR, April, 2025

MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique.
CoRR, April, 2025

PRM-BAS: Enhancing Multimodal Reasoning through PRM-guided Beam Annealing Search.
CoRR, April, 2025

StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation.
CoRR, March, 2025

Latent Swap Joint Diffusion for Long-Form Audio Generation.
CoRR, February, 2025

Skeleton and Font Generation Network for Zero-shot Chinese Character Generation.
CoRR, January, 2025

Dual-Branch Codec With Orthogonality Constraint and Knowledge Distillation for Noisy Environment.
IEEE Signal Process. Lett., 2025

Controllable Conformer for Speech Enhancement and Recognition.
IEEE Signal Process. Lett., 2025

Multi-low resource languages in palm leaf manuscript recognition: Syllable-based augmentation and error analysis.
Pattern Recognit. Lett., 2025

Count, decompose and correct: A new approach to handwritten Chinese character error correction.
Pattern Recognit., 2025

Bidirectional trained tree-structured decoder for Handwritten Mathematical Expression Recognition.
Pattern Recognit., 2025

Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement.
Inf. Fusion, 2025

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Phoneme-Level Contrastive Learning for User-Defined Keyword Spotting with Flexible Enrollment.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Projection Valued-based Quantum Machine Learning Adapting to Differential Privacy Algorithm for Word-level Lipreading.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

DocMamba: Efficient Document Pre-training with State Space Model.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

RFL: Simplifying Chemical Structure Recognition with Ring-Free Language.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Generate, transform, and clean: the role of GANs and transformers in palm leaf manuscript generation and enhancement.
Int. J. Document Anal. Recognit., September, 2024

Collaborative Viseme Subword and End-to-End Modeling for Word-Level Lip Reading.
IEEE Trans. Multim., 2024

A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SEMv2: Table separation line detection based on instance segmentation.
Pattern Recognit., 2024

High-order dilated nested arrays with increased degrees of freedom and reduced mutual coupling.
Digit. Signal Process., 2024

DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions.
CoRR, 2024

Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization.
CoRR, 2024

See then Tell: Enhancing Key Information Extraction with Vision Grounding.
CoRR, 2024

The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge.
CoRR, 2024

Quality-aware Masked Diffusion Transformer for Enhanced Music Generation.
CoRR, 2024

Multitask frame-level learning for few-shot sound event detection.
CoRR, 2024

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition.
CoRR, 2024

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Summary of Low-Resource Dysarthria Wake-Up Word Spotting Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Layer-Adaptive Low-Rank Adaptation of Large ASR Model for Low-Resource Multilingual Scenarios.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Online Neural Speaker Diarization with Spectral Clustering for Meeting Scenarios.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Representation Learning Using Machine Attribute Information for Anomalous Sound Detection in Real Scenarios.
Proceedings of the International Joint Conference on Neural Networks, 2024

SEMv3: A Fast and Robust Approach to Table Separation Line Detection.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Summary on the Chat-Scenario Chinese Lipreading (ChatCLR) Challenge.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

The NERCSLIP-USTC System for Semi-Supervised Acoustic Scene Classification of ICME 2024 Grand Challenge.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Maths: Multimodal Transformer-Based Human-Readable Solver.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Radical Similarity Based Model Optimization and Post-correction for Chinese Character Recognition.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

ICDAR 2024 Competition on Recognition of Chemical Structures.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture.
Proceedings of the IEEE International Conference on Acoustics, 2024

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2024

Implicit Enhancement of Target Speaker in Speaker-Adaptive ASR through Efficient Joint Optimization.
Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Multi-Modal Emotion Recognition Using Entropy-Based Fusion and Pruning-Based Network Architecture Optimization.
Proceedings of the IEEE International Conference on Acoustics, 2024

A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

The USTC System for Cadenza 2024 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024

Viewing Writing as Video: Optical Flow based Multi-Modal Handwritten Mathematical Expression Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024

UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

NAMER: Non-autoregressive Modeling for Handwritten Mathematical Expression Recognition.
Proceedings of the Computer Vision - ECCV 2024, 2024

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

KhmerFormer: Multi-Scale CNNs-Transformer with External Attention for Ancient Khmer Palm Leaf Isolated Glyph Classification.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech.
Speech Commun., September, 2023

Joint optimization for attention-based generation and recognition of chinese characters using tree position embedding.
Pattern Recognit., August, 2023

Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
Speech Commun., July, 2023

Multimodal Pre-Training Based on Graph Attention Network for Document Understanding.
IEEE Trans. Multim., 2023

A Tree-Structure Analysis Network on Handwritten Chinese Character Error Correction.
IEEE Trans. Multim., 2023

SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Energy-Efficient Sparsity-Driven Speech Enhancement in Wireless Acoustic Sensor Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
CoRR, 2023

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge.
CoRR, 2023

Count, Decode and Fetch: A New Approach to Handwritten Chinese Character Error Correction.
CoRR, 2023

SEMv2: Table Separation Line Detection Based on Conditional Convolution.
CoRR, 2023

Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Handwritten Chemical Structure Image to Structure-Specific Markup Using Random Conditional Guided Decoder.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Multiple-Teacher Pruning Based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Frame-Level Embedding Learning for Few-shot Bioacoustic Event Detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Vision-Language Adaptive Mutual Decoder for OOV-STR.
Proceedings of the Image and Graphics - 12th International Conference, 2023

Group, Contrast and Recognize: A Self-supervised Method for Chinese Character Recognition.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing audio-visual Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Loss Function Design for DNN-Based Sound Event Localization and Detection on Low-Resource Realistic Data.
Proceedings of the IEEE International Conference on Acoustics, 2023

Quantum Transfer Learning Using the Large-Scale Unsupervised Pre-Trained Model Wavlm-Large for Synthetic Speech Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

Super Dilated Nested Arrays with Ideal Critical Weights and Increased Degrees of Freedom.
Proceedings of the IEEE International Conference on Acoustics, 2023

An Experimental Study on Sound Event Localization and Detection Under Realistic Testing Conditions.
Proceedings of the IEEE International Conference on Acoustics, 2023

Incorporating Lip Features into Audio-Visual Multi-Speaker DOA Estimation by Gated Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2023

Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

Enhancing Math Word Problem Solving Through Salient Clue Prioritization: A Joint Token-Phrase-Level Feature Integration Approach.
Proceedings of the International Conference on Asian Language Processing, 2023

USTC-iFLYTEK at DocILE: A Multi-modal Approach Using Domain-specific GraphDoc.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), 2023

Semi-Supervised Multi-Channel Speaker Diarization With Cross-Channel Attention.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Enhancing Privacy Preservation with Quantum Computing for Word-Level Audio-Visual Speech Recognition.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Improving Sound Event Localization and Detection with Class-Dependent Sound Separation for Real-World Scenarios.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Correlated Multi-Level Speech Enhancement for Robust Real-World ASR Applications Using Mask-Waveform-Feature Optimization.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

HRDoc: Dataset and Baseline Method toward Hierarchical Reconstruction of Document Structures.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Dilated Nested Arrays With More Degrees of Freedom (DOFs) and Less Mutual Coupling - Part I: The Fundamental Geometry.
IEEE Trans. Signal Process., 2022

Split, Embed and Merge: An accurate table structure recognizer.
Pattern Recognit., 2022

Tree-based data augmentation and mutual learning for offline handwritten mathematical expression recognition.
Pattern Recognit., 2022

A multimodal attention fusion network with a dynamic vocabulary for TextVQA.
Pattern Recognit., 2022

Fast writer adaptation with style extractor network for handwritten text recognition.
Neural Networks, 2022

A Study on Joint Modeling and Data Augmentation of Multi-Modalities for Audio-Visual Scene Classification.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Multi-Task Joint Learning for Embedding Aware Audio-Visual Speech Enhancement.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Online Speaker Diarization with Core Samples Selection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Deep Segment Model for Acoustic Scene Classification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Audio-Visual Neural Speaker Diarization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Scene Text Recognition with Self-supervised Contrastive Predictive Coding.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

Multimodal Tree Decoder for Table of Contents Extraction in Document Images.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

Improving Isolated Glyph Classification Task for Palm Leaf Manuscripts.
Proceedings of the Frontiers in Handwriting Recognition - 18th International Conference, 2022

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning.
Proceedings of the IEEE International Conference on Acoustics, 2022

The Prototype Co-Prime Array with a Robust Difference Co-Array.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Separation-Based Speaker Diarization Via Iterative Model Refinement And Speaker Embedding Based Post-Processing.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Time Domain Progressive Learning Approach with SNR Constriction for Single-Channel Speech Enhancement and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
Proceedings of the IEEE International Conference on Acoustics, 2022

Online Neural Speaker Diarization with Core Samples.
Proceedings of the Biometric Recognition - 16th Chinese Conference, 2022

Multi-branch Network with Circle Loss Using Voice Conversion and Channel Robust Data Augmentation for Synthetic Speech Detection.
Proceedings of the Biometric Recognition - 16th Chinese Conference, 2022

TDv2: A Novel Tree-Structured Decoder for Offline Mathematical Expression Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
SRD: A Tree Structure Based Decoder for Online Handwritten Mathematical Expression Recognition.
IEEE Trans. Multim., 2021

Information Fusion in Attention Networks Using Adaptive and Multi-Level Factorized Bilinear Pooling for Audio-Visual Emotion Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Sensor Selection for Relative Acoustic Transfer Function Steered Linearly-Constrained Beamformers.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

TextMountain: Accurate scene text detection via instance segmentation.
Pattern Recognit., 2021

Stroke constrained attention network for online handwritten mathematical expression recognition.
Pattern Recognit., 2021

Joint architecture and knowledge distillation in CNN for Chinese text recognition.
Pattern Recognit., 2021

Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement.
Neural Networks, 2021

Split, embed and merge: An accurate table structure recognizer.
CoRR, 2021

Separation Guided Speaker Diarization in Realistic Mismatched Conditions.
CoRR, 2021

A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification.
CoRR, 2021

USTC-NELSLIP System Description for DIHARD-III Challenge.
CoRR, 2021

The practice of speech and language processing in China.
Commun. ACM, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Acoustic Modeling for Multi-Array Conversational Speech Recognition in the Chime-6 Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Speech Emotion Recognition Based on Acoustic Segment Model.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

A Model Ensemble Approach for Sound Event Localization and Detection.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Scenario-Dependent Speaker Diarization for DIHARD-III Challenge.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The Third DIHARD Diarization Challenge.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Accurate Oriented Instance Segmentation in Aerial Images.
Proceedings of the Image and Graphics - 11th International Conference, 2021

An Open-Source Library of 2D-GMM-HMM Based on Kaldi Toolkit and Its Application to Handwritten Chinese Character Recognition.
Proceedings of the Image and Graphics - 11th International Conference, 2021

Radical Composition Network for Chinese Character Generation.
Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021

MRD: A Memory Relation Decoder for Online Handwritten Mathematical Expression Recognition.
Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021

TCLA Array: A New Sparse Array Design with Less Mutual Coupling.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Enhancement Autoencoder with Hierarchical Latent Structure.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Two-Stage Approach to Device-Robust Acoustic Scene Classification.
Proceedings of the IEEE International Conference on Acoustics, 2021

HMM-based Lip Reading with Stingy Residual 3D Convolution.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

A Deep Analysis of Speech Separation Guided Diarization Under Realistic Conditions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network-Based Vector-to-Vector Regression.
IEEE Trans. Signal Process., 2020

Adaptive Period Embedding for Representing Oriented Objects in Aerial Images.
IEEE Trans. Geosci. Remote. Sens., 2020

A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression.
IEEE Signal Process. Lett., 2020

Radical analysis network for learning hierarchies of Chinese characters.
Pattern Recognit., 2020

Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese text recognition.
Pattern Recognit., 2020

Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention.
CoRR, 2020

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation.
CoRR, 2020

Third DIHARD Challenge Evaluation Plan.
CoRR, 2020

Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition.
CoRR, 2020

Attentive batch normalization for lstm-based acoustic modeling of speech recognition.
CoRR, 2020


Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Adaptive X-Vector Model for Text-Independent Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Adaptive Speaker Normalization for CTC-Based Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Unsupervised Regularization-Based Adaptive Training for Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Transformer-based Radical Analysis Network for Chinese Character Recognition.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Stroke Based Posterior Attention for Online Handwritten Mathematical Expression Recognition.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Radical Counter Network for Robust Chinese Character Recognition.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

A Tree-Structured Decoder for Image-to-Markup Generation.
Proceedings of the 37th International Conference on Machine Learning, 2020

A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2D-to-2D Mask Estimation for Speech Enhancement Based on Fully Convolutional Neural Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Geometry Constrained Progressive Learning for Lstm-Based Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Progressive Multi-Target Network Based Speech Enhancement with Snr-Preselection for Robust Speaker Diarization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Maximum Likelihood Approach to Multi-Objective Learning Using Generalized Gaussian Distributions for Dnn-Based Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Attention-Based Gated Scaling Adaptive Acoustic Model for CTC-Based Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

High-Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Performance Analysis for Tensor-Train Decomposition to Deep Neural Network Based Vector-to-Vector Regression.
Proceedings of the 54th Annual Conference on Information Sciences and Systems, 2020

Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition.
IEEE Trans. Multim., 2019

Speech Enhancement Based on Teacher-Student Deep Learning Using Improved Speech Presence Probability for Noise-Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Mixed-Bandwidth Cross-Channel Speech Recognition via Joint Optimization of DNN-Based Bandwidth Expansion and Acoustic Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep Learning-Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

An iterative mask estimation approach to deep learning based multi-channel speech recognition.
Speech Commun., 2019

Rotated cascade R-CNN: A shape robust detector with coordinate regression.
Pattern Recognit., 2019

A Speaker-Dependent Approach to Separation of Far-Field Multi-Talker Microphone Array Speech for Front-End Processing in the CHiME-5 Challenge.
IEEE J. Sel. Top. Signal Process., 2019

Joint Architecture and Knowledge Distillation in Convolutional Neural Network for Offline Handwritten Chinese Text Recognition.
CoRR, 2019

Deep Neural Network Embedding Learning with High-Order Statistics for Text-Independent Speaker Verification.
CoRR, 2019

Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multi-Task Learning with High-Order Statistics for x-Vector Based Text-Independent Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The Second DIHARD Diarization Challenge: Dataset, Task, and Baselines.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

KL-Divergence Regularized Deep Neural Network Adaptation for Low-Resource Speaker-Dependent Speech Enhancement.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition.
Proceedings of the International Joint Conference on Neural Networks, 2019

Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition.
Proceedings of the International Conference on Multimodal Interaction, 2019

Joint Spatial and Radical Analysis Network For Distorted Chinese Character Recognition.
Proceedings of the Second International Workshop on Machine Learning, 2019

Multi-modal Attention Network for Handwritten Mathematical Expression Recognition.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

DNN Training Based on Classic Gain Function for Single-channel Speech Enhancement and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Two-stage Single-channel Speaker-dependent Speech Separation Approach for Chime-5 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Speech Enhancement Neural Network Architecture with SNR-Progressive Multi-Target Learning for Robust Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

A LSTM-Based Joint Progressive Learning Framework for Simultaneous Speech Dereverberation and Denoising.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech.
J. Signal Process. Syst., 2018

Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition.
J. Signal Process. Syst., 2018

A Multiobjective Learning and Ensembling Approach to High-Performance Speech Enhancement With Compact Neural Network Architectures.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition.
Int. J. Document Anal. Recognit., 2018

Acoustics-guided evaluation (AGE): a new measure for estimating performance of speech enhancement algorithms for robust ASR.
CoRR, 2018

Attention Based Fully Convolutional Network for Speech Emotion Recognition.
CoRR, 2018

An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

A Progressive Deep Learning Approach to Child Speech Separation.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

A Maximum Likelihood Approach to Masking-based Speech Enhancement Using Deep Neural Network.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Speaker Diarization with Enhancing Speech for the First DIHARD Challenge.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Sliding Line Point Regression for Shape Robust Scene Text Detection.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Radical Analysis Network for Zero-Shot Learning in Printed Chinese Character Recognition.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

DenseRAN for Offline Handwritten Chinese Character Recognition.
Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition, 2018

Parsimonious HMMs for Offline Handwritten Chinese Text Recognition.
Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition, 2018

A Novel LSTM-Based Speech Preprocessor for Speaker Diarization in Realistic Mismatch Conditions.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Enhancement and Analysis of Conversational Speech: JSALT 2017.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Densely Connected Progressive Learning for LSTM-Based Speech Enhancement.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Fast and Robust Detection of Anatomical Landmarks Using Cascaded 3D Convolutional Networks Guided by Linear Square Regression.
Proceedings of the Biometric Recognition - 13th Chinese Conference, 2018

Attention Based Fully Convolutional Network for Speech Emotion Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

An Analysis of Speaker Diarization Fusion Methods For The First DIHARD Challenge.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Online LSTM-based Iterative Mask Estimation for Multi-Channel Speech Enhancement and ASR.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Online Speaker Adaptation for LVCSR Based on Attention Mechanism.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments.
Speech Commun., 2017

Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition.
Pattern Recognit., 2017

Hierarchical deep neural network for multivariate regression.
Pattern Recognit., 2017

Writer adaptation via deeply learned features for online Chinese handwriting recognition.
Int. J. Document Anal. Recognit., 2017

An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech.
Comput. Speech Lang., 2017

RAN: Radical analysis networks for zero-shot learning of Chinese characters.
CoRR, 2017

On generating mixing noise signals with basis functions for simulating noisy speech and learning dnn-based speech enhancement models.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Gaussian density guided deep neural network for single-channel speech enhancement.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An investigation of high-resolution modeling units of deep neural networks for acoustic scene classification.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

Joint noise and mask aware training for DNN-based speech enhancement with SUB-band features.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

Multiple-target deep learning for LSTM-RNN based speech enhancement.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

A maximum likelihood approach to deep neural network based speech dereverberation.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

LSTM-based iterative mask estimation and post-processing for multi-channel speech enhancement.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Dual Learning of the Generator and Recognizer for Chinese Characters.
Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, 2017

Deep Convolutional Neural Network Based Hidden Markov Model for Offline Handwritten Chinese Text Recognition.
Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, 2017

2016
A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition.
EURASIP J. Adv. Signal Process., 2016

Deep neural network for robust speech recognition with auxiliary features from laser-Doppler vibrometer sensor.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

A regression approach to binaural speech segregation via deep neural network.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

An experimental study on joint modeling of mixed-bandwidth data via deep neural networks for robust speech recognition.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Deep neural network based hidden Markov model for offline handwritten Chinese text recognition.
Proceedings of the 23rd International Conference on Pattern Recognition, 2016

Writer Code Based Adaptation of Deep Neural Network for Offline Handwritten Chinese Text Recognition.
Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016

Recognition of Social Touch Gestures Using 3D Convolutional Neural Networks.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

Unsupervised single-channel speech separation via deep neural network for different gender mixtures.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Boosting DNN-based speech enhancement via explicit transformations.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
A Regression Approach to Speech Enhancement Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A universal VAD based on jointly trained deep neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Writer adaptive feature extraction based on convolutional neural networks for online handwritten Chinese character recognition.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Joint training of front-end and back-end deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments.
Proceedings of the Latent Variable Analysis and Signal Separation, 2015

A unified speaker-dependent speech separation and enhancement system based on deep neural networks.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
An Improved VTS Feature Compensation using Mixture Models of Distortion and IVN Training for Noisy Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

An Experimental Study on Speech Enhancement Based on Deep Neural Networks.
IEEE Signal Process. Lett., 2014

An irrelevant variability normalization approach to discriminative training of multi-prototype based classifiers and its applications for online handwritten Chinese character recognition.
Pattern Recognit., 2014

Cross-language transfer learning for deep neural network based speech enhancement.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Dynamic noise aware training for speech enhancement based on deep neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Robust speech recognition with speech enhanced deep neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A Study of Designing Compact Classifiers Using Deep Neural Networks for Online Handwritten Chinese Character Recognition.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Writer Adaptation Using Bottleneck Features and Discriminative Linear Regression for Online Handwritten Chinese Character Recognition.
Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition, 2014

Synthesized stereo mapping via deep neural networks for noisy speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Global variance equalization for improving deep neural network based speech enhancement.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

2013
A discriminative linear regression approach to adaptation of multi-prototype based classifiers and its applications for Chinese OCR.
Pattern Recognit., 2013

An Irrelevant Variability Normalization Based Discriminative Training Approach for Online Handwritten Chinese Character Recognition.
Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013

A VTS-based feature compensation approach to noisy speech recognition using mixture models of distortion.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Synthesized stereo-based stochastic mapping with data selection for robust speech recognition.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

IVN-Based Joint Training Of GMM And HMMs Using An Improved VTS-Based Feature Compensation For Noisy Speech Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

A discriminative linear regression approach to OCR adaptation.
Proceedings of the 21st International Conference on Pattern Recognition, 2012

Designing compact classifiers for rotation-free recognition of large vocabulary online handwritten Chinese characters.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Boosted Mixture Learning of Gaussian Mixture Hidden Markov Models Based on Maximum Likelihood for Speech Recognition.
IEEE Trans. Speech Audio Process., 2011

A Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Model for Noisy Speech Recognition.
IEEE Trans. Speech Audio Process., 2011

Snap and Translate Using Windows Phone.
Proceedings of the 2011 International Conference on Document Analysis and Recognition, 2011

2010
Boosted mixture learning of Gaussian mixture HMMs for speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

HMM-based pseudo-clean speech synthesis for splice algorithm.
Proceedings of the IEEE International Conference on Acoustics, 2010

2008
Evaluation of a Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Modelon Aurora2, Aurora3, and Aurora4 Tasks.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Cepstral shape normalization (CSN) for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

A feature compensation approach using piecewise linear approximation of an explicit distortion model for noisy speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Performance of Discriminative HMM Training in Noise.
Int. J. Comput. Linguistics Chin. Lang. Process., 2007

A New Minimum Divergence Approach to Discriminative Training.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Training Discriminative HMM by Optimal Allocation of Gaussian Kernels.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

Noisy Speech Recognition Performance of Discriminative HMMs.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Minimum divergence based discriminative training.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006


  Loading...