Jun Du
Orcid: 0000-0002-2387-0389Affiliations:
- University of Science and Technology of China, USTC, School of Information Science and Technology, Hefei, Anhui, China (PhD 2009)
- Microsoft Research Asia, Department of handwriting recognition, OCR, China (2010-2013)
- iFlytek Research, Department of speech recognition, China (2009-2010)
- University of Science and Technology of China, China (PhD 2009)
According to our database1,
Jun Du
authored at least 332 papers
between 2006 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
Three-stage modular speaker diarization collaborating with front-end techniques in the CHiME-8 NOTSOFAR-1 challenge.
Comput. Speech Lang., 2026
2025
Unsupervised Low-Light Image Enhancement Based on Curve Estimation and Illumination Perception.
Signal Image Video Process., August, 2025
An Enhanced Audio Feature Tailored for Anomalous Sound Detection Based on Pre-trained Models.
CoRR, August, 2025
Cross-Modal Knowledge Distillation with Multi-Level Data Augmentation for Low-Resource Audio-Visual Sound Event Localization and Detection.
CoRR, August, 2025
READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation.
CoRR, August, 2025
Lightweight Audio-Visual Wake Word Spotting With Diverse Acoustic Knowledge Distillation.
IEEE Trans. Circuits Syst. Video Technol., July, 2025
HPCNet: Hybrid Pixel and Contour Network for Audio-Visual Speech Enhancement With Low-Quality Video.
IEEE J. Sel. Top. Signal Process., May, 2025
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition.
CoRR, May, 2025
Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge.
CoRR, May, 2025
Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration.
CoRR, April, 2025
CoRR, April, 2025
CoRR, April, 2025
StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation.
CoRR, March, 2025
CoRR, January, 2025
Dual-Branch Codec With Orthogonality Constraint and Knowledge Distillation for Noisy Environment.
IEEE Signal Process. Lett., 2025
IEEE Signal Process. Lett., 2025
Multi-low resource languages in palm leaf manuscript recognition: Syllable-based augmentation and error analysis.
Pattern Recognit. Lett., 2025
Count, decompose and correct: A new approach to handwritten Chinese character error correction.
Pattern Recognit., 2025
Bidirectional trained tree-structured decoder for Handwritten Mathematical Expression Recognition.
Pattern Recognit., 2025
Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement.
Inf. Fusion, 2025
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Phoneme-Level Contrastive Learning for User-Defined Keyword Spotting with Flexible Enrollment.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Projection Valued-based Quantum Machine Learning Adapting to Differential Privacy Algorithm for Word-level Lipreading.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
Generate, transform, and clean: the role of GANs and transformers in palm leaf manuscript generation and enhancement.
Int. J. Document Anal. Recognit., September, 2024
IEEE Trans. Multim., 2024
A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Pattern Recognit., 2024
High-order dilated nested arrays with increased degrees of freedom and reduced mutual coupling.
Digit. Signal Process., 2024
DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions.
CoRR, 2024
CoRR, 2024
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition.
CoRR, 2024
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Layer-Adaptive Low-Rank Adaptation of Large ASR Model for Low-Resource Multilingual Scenarios.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Representation Learning Using Machine Attribute Information for Anomalous Sound Detection in Real Scenarios.
Proceedings of the International Joint Conference on Neural Networks, 2024
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
The NERCSLIP-USTC System for Semi-Supervised Acoustic Scene Classification of ICME 2024 Grand Challenge.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Radical Similarity Based Model Optimization and Post-correction for Chinese Character Recognition.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture.
Proceedings of the IEEE International Conference on Acoustics, 2024
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2024
Implicit Enhancement of Target Speaker in Speaker-Adaptive ASR through Efficient Joint Optimization.
Proceedings of the IEEE International Conference on Acoustics, 2024
Improving Multi-Modal Emotion Recognition Using Entropy-Based Fusion and Pruning-Based Network Architecture Optimization.
Proceedings of the IEEE International Conference on Acoustics, 2024
A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Viewing Writing as Video: Optical Flow based Multi-Modal Handwritten Mathematical Expression Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
NAMER: Non-autoregressive Modeling for Handwritten Mathematical Expression Recognition.
Proceedings of the Computer Vision - ECCV 2024, 2024
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
KhmerFormer: Multi-Scale CNNs-Transformer with External Attention for Ancient Khmer Palm Leaf Isolated Glyph Classification.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024
2023
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech.
Speech Commun., September, 2023
Joint optimization for attention-based generation and recognition of chinese characters using tree position embedding.
Pattern Recognit., August, 2023
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
Speech Commun., July, 2023
IEEE Trans. Multim., 2023
IEEE Trans. Multim., 2023
SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Energy-Efficient Sparsity-Driven Speech Enhancement in Wireless Acoustic Sensor Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
CoRR, 2023
Count, Decode and Fetch: A New Approach to Handwritten Chinese Character Error Correction.
CoRR, 2023
Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Handwritten Chemical Structure Image to Structure-Specific Markup Using Random Conditional Guided Decoder.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
A Multiple-Teacher Pruning Based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023
Proceedings of the Image and Graphics - 12th International Conference, 2023
Group, Contrast and Recognize: A Self-supervised Method for Chinese Character Recognition.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023
Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing audio-visual Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Loss Function Design for DNN-Based Sound Event Localization and Detection on Low-Resource Realistic Data.
Proceedings of the IEEE International Conference on Acoustics, 2023
Quantum Transfer Learning Using the Large-Scale Unsupervised Pre-Trained Model Wavlm-Large for Synthetic Speech Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023
Super Dilated Nested Arrays with Ideal Critical Weights and Increased Degrees of Freedom.
Proceedings of the IEEE International Conference on Acoustics, 2023
An Experimental Study on Sound Event Localization and Detection Under Realistic Testing Conditions.
Proceedings of the IEEE International Conference on Acoustics, 2023
Incorporating Lip Features into Audio-Visual Multi-Speaker DOA Estimation by Gated Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Enhancing Math Word Problem Solving Through Salient Clue Prioritization: A Joint Token-Phrase-Level Feature Integration Approach.
Proceedings of the International Conference on Asian Language Processing, 2023
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Enhancing Privacy Preservation with Quantum Computing for Word-Level Audio-Visual Speech Recognition.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Improving Sound Event Localization and Detection with Class-Dependent Sound Separation for Real-World Scenarios.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Correlated Multi-Level Speech Enhancement for Robust Real-World ASR Applications Using Mask-Waveform-Feature Optimization.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
HRDoc: Dataset and Baseline Method toward Hierarchical Reconstruction of Document Structures.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Dilated Nested Arrays With More Degrees of Freedom (DOFs) and Less Mutual Coupling - Part I: The Fundamental Geometry.
IEEE Trans. Signal Process., 2022
Pattern Recognit., 2022
Tree-based data augmentation and mutual learning for offline handwritten mathematical expression recognition.
Pattern Recognit., 2022
Pattern Recognit., 2022
Fast writer adaptation with style extractor network for handwritten text recognition.
Neural Networks, 2022
A Study on Joint Modeling and Data Augmentation of Multi-Modalities for Audio-Visual Scene Classification.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 26th International Conference on Pattern Recognition, 2022
Proceedings of the 26th International Conference on Pattern Recognition, 2022
Proceedings of the Frontiers in Handwriting Recognition - 18th International Conference, 2022
A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving Separation-Based Speaker Diarization Via Iterative Model Refinement And Speaker Embedding Based Post-Processing.
Proceedings of the IEEE International Conference on Acoustics, 2022
A Time Domain Progressive Learning Approach with SNR Constriction for Single-Channel Speech Enhancement and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the Biometric Recognition - 16th Chinese Conference, 2022
Multi-branch Network with Circle Loss Using Voice Conversion and Channel Robust Data Augmentation for Synthetic Speech Detection.
Proceedings of the Biometric Recognition - 16th Chinese Conference, 2022
TDv2: A Novel Tree-Structured Decoder for Offline Mathematical Expression Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
SRD: A Tree Structure Based Decoder for Online Handwritten Mathematical Expression Recognition.
IEEE Trans. Multim., 2021
Information Fusion in Attention Networks Using Adaptive and Multi-Level Factorized Bilinear Pooling for Audio-Visual Emotion Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Sensor Selection for Relative Acoustic Transfer Function Steered Linearly-Constrained Beamformers.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Pattern Recognit., 2021
Stroke constrained attention network for online handwritten mathematical expression recognition.
Pattern Recognit., 2021
Pattern Recognit., 2021
Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement.
Neural Networks, 2021
A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification.
CoRR, 2021
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Acoustic Modeling for Multi-Array Conversational Speech Recognition in the Chime-6 Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the Image and Graphics - 11th International Conference, 2021
An Open-Source Library of 2D-GMM-HMM Based on Kaldi Toolkit and Its Application to Handwritten Chinese Character Recognition.
Proceedings of the Image and Graphics - 11th International Conference, 2021
Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021
MRD: A Memory Relation Decoder for Online Handwritten Mathematical Expression Recognition.
Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
2020
Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network-Based Vector-to-Vector Regression.
IEEE Trans. Signal Process., 2020
IEEE Trans. Geosci. Remote. Sens., 2020
A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
IEEE Signal Process. Lett., 2020
Pattern Recognit., 2020
Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese text recognition.
Pattern Recognit., 2020
Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation.
CoRR, 2020
Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition.
CoRR, 2020
Attentive batch normalization for lstm-based acoustic modeling of speech recognition.
CoRR, 2020
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 25th International Conference on Pattern Recognition, 2020
Stroke Based Posterior Attention for Online Handwritten Mathematical Expression Recognition.
Proceedings of the 25th International Conference on Pattern Recognition, 2020
Proceedings of the 25th International Conference on Pattern Recognition, 2020
Proceedings of the 37th International Conference on Machine Learning, 2020
A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2D-to-2D Mask Estimation for Speech Enhancement Based on Fully Convolutional Neural Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Progressive Multi-Target Network Based Speech Enhancement with Snr-Preselection for Robust Speaker Diarization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
A Maximum Likelihood Approach to Multi-Objective Learning Using Generalized Gaussian Distributions for Dnn-Based Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Attention-Based Gated Scaling Adaptive Acoustic Model for CTC-Based Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
High-Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Performance Analysis for Tensor-Train Decomposition to Deep Neural Network Based Vector-to-Vector Regression.
Proceedings of the 54th Annual Conference on Information Sciences and Systems, 2020
Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020
2019
Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition.
IEEE Trans. Multim., 2019
Speech Enhancement Based on Teacher-Student Deep Learning Using Improved Speech Presence Probability for Noise-Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Mixed-Bandwidth Cross-Channel Speech Recognition via Joint Optimization of DNN-Based Bandwidth Expansion and Acoustic Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep Learning-Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
An iterative mask estimation approach to deep learning based multi-channel speech recognition.
Speech Commun., 2019
Pattern Recognit., 2019
A Speaker-Dependent Approach to Separation of Far-Field Multi-Talker Microphone Array Speech for Front-End Processing in the CHiME-5 Challenge.
IEEE J. Sel. Top. Signal Process., 2019
Joint Architecture and Knowledge Distillation in Convolutional Neural Network for Offline Handwritten Chinese Text Recognition.
CoRR, 2019
Deep Neural Network Embedding Learning with High-Order Statistics for Text-Independent Speaker Verification.
CoRR, 2019
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Multi-Task Learning with High-Order Statistics for x-Vector Based Text-Independent Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
KL-Divergence Regularized Deep Neural Network Adaptation for Low-Resource Speaker-Dependent Speech Enhancement.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition.
Proceedings of the International Joint Conference on Neural Networks, 2019
Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition.
Proceedings of the International Conference on Multimodal Interaction, 2019
Joint Spatial and Radical Analysis Network For Distorted Chinese Character Recognition.
Proceedings of the Second International Workshop on Machine Learning, 2019
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019
DNN Training Based on Classic Gain Function for Single-channel Speech Enhancement and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
A Two-stage Single-channel Speaker-dependent Speech Separation Approach for Chime-5 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2019
A Speech Enhancement Neural Network Architecture with SNR-Progressive Multi-Target Learning for Robust Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
A LSTM-Based Joint Progressive Learning Framework for Simultaneous Speech Dereverberation and Denoising.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech.
J. Signal Process. Syst., 2018
Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition.
J. Signal Process. Syst., 2018
A Multiobjective Learning and Ensembling Approach to High-Performance Speech Enhancement With Compact Neural Network Architectures.
IEEE ACM Trans. Audio Speech Lang. Process., 2018
A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition.
Int. J. Document Anal. Recognit., 2018
Acoustics-guided evaluation (AGE): a new measure for estimating performance of speech enhancement algorithms for robust ASR.
CoRR, 2018
CoRR, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
A Maximum Likelihood Approach to Masking-based Speech Enhancement Using Deep Neural Network.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 24th International Conference on Pattern Recognition, 2018
Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition.
Proceedings of the 24th International Conference on Pattern Recognition, 2018
Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition.
Proceedings of the 24th International Conference on Pattern Recognition, 2018
Radical Analysis Network for Zero-Shot Learning in Printed Chinese Character Recognition.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018
Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition, 2018
Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition, 2018
A Novel LSTM-Based Speech Preprocessor for Speaker Diarization in Realistic Mismatch Conditions.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Fast and Robust Detection of Anatomical Landmarks Using Cascaded 3D Convolutional Networks Guided by Linear Square Regression.
Proceedings of the Biometric Recognition - 13th Chinese Conference, 2018
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
Online LSTM-based Iterative Mask Estimation for Multi-Channel Speech Enhancement and ASR.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
2017
A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments.
Speech Commun., 2017
Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition.
Pattern Recognit., 2017
Pattern Recognit., 2017
Writer adaptation via deeply learned features for online Chinese handwriting recognition.
Int. J. Document Anal. Recognit., 2017
An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech.
Comput. Speech Lang., 2017
CoRR, 2017
On generating mixing noise signals with basis functions for simulating noisy speech and learning dnn-based speech enhancement models.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017
A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
An investigation of high-resolution modeling units of deep neural networks for acoustic scene classification.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017
A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017
Joint noise and mask aware training for DNN-based speech enhancement with SUB-band features.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
LSTM-based iterative mask estimation and post-processing for multi-channel speech enhancement.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, 2017
Deep Convolutional Neural Network Based Hidden Markov Model for Offline Handwritten Chinese Text Recognition.
Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, 2017
2016
A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition.
EURASIP J. Adv. Signal Process., 2016
Deep neural network for robust speech recognition with auxiliary features from laser-Doppler vibrometer sensor.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
An experimental study on joint modeling of mixed-bandwidth data via deep neural networks for robust speech recognition.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016
Deep neural network based hidden Markov model for offline handwritten Chinese text recognition.
Proceedings of the 23rd International Conference on Pattern Recognition, 2016
Writer Code Based Adaptation of Deep Neural Network for Offline Handwritten Chinese Text Recognition.
Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016
Unsupervised single-channel speech separation via deep neural network for different gender mixtures.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
2015
IEEE ACM Trans. Audio Speech Lang. Process., 2015
Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Writer adaptive feature extraction based on convolutional neural networks for online handwritten Chinese character recognition.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015
Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Joint training of front-end and back-end deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the Latent Variable Analysis and Signal Separation, 2015
A unified speaker-dependent speech separation and enhancement system based on deep neural networks.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015
An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
2014
An Improved VTS Feature Compensation using Mixture Models of Distortion and IVN Training for Noisy Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2014
IEEE Signal Process. Lett., 2014
An irrelevant variability normalization approach to discriminative training of multi-prototype based classifiers and its applications for online handwritten Chinese character recognition.
Pattern Recognit., 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
A Study of Designing Compact Classifiers Using Deep Neural Networks for Online Handwritten Chinese Character Recognition.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014
Writer Adaptation Using Bottleneck Features and Discriminative Linear Regression for Online Handwritten Chinese Character Recognition.
Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Global variance equalization for improving deep neural network based speech enhancement.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014
2013
A discriminative linear regression approach to adaptation of multi-prototype based classifiers and its applications for Chinese OCR.
Pattern Recognit., 2013
An Irrelevant Variability Normalization Based Discriminative Training Approach for Online Handwritten Chinese Character Recognition.
Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013
A VTS-based feature compensation approach to noisy speech recognition using mixture models of distortion.
Proceedings of the IEEE International Conference on Acoustics, 2013
2012
Synthesized stereo-based stochastic mapping with data selection for robust speech recognition.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
IVN-Based Joint Training Of GMM And HMMs Using An Improved VTS-Based Feature Compensation For Noisy Speech Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 21st International Conference on Pattern Recognition, 2012
Designing compact classifiers for rotation-free recognition of large vocabulary online handwritten Chinese characters.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
2011
Boosted Mixture Learning of Gaussian Mixture Hidden Markov Models Based on Maximum Likelihood for Speech Recognition.
IEEE Trans. Speech Audio Process., 2011
A Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Model for Noisy Speech Recognition.
IEEE Trans. Speech Audio Process., 2011
Proceedings of the 2011 International Conference on Document Analysis and Recognition, 2011
2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
2008
Evaluation of a Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Modelon Aurora2, Aurora3, and Aurora4 Tasks.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Proceedings of the IEEE International Conference on Acoustics, 2008
A feature compensation approach using piecewise linear approximation of an explicit distortion model for noisy speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008
2007
Int. J. Comput. Linguistics Chin. Lang. Process., 2007
Proceedings of the IEEE International Conference on Acoustics, 2007
2006
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006