Jia Pan

This page is a disambiguation page, it actually contains mutiple papers from persons of the same or a similar name.

Known people with the same name:

Bibliography

2026
Three-stage modular speaker diarization collaborating with front-end techniques in the CHiME-8 NOTSOFAR-1 challenge.
Comput. Speech Lang., 2026

2025
CollaBot: Vision-Language Guided Simultaneous Collaborative Manipulation.
CoRR, August, 2025

iPad: Iterative Proposal-centric End-to-End Autonomous Driving.
CoRR, May, 2025

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation.
CoRR, April, 2025

Controllable Conformer for Speech Enhancement and Recognition.
IEEE Signal Process. Lett., 2025

Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement.
Inf. Fusion, 2025

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024
Collaborative Viseme Subword and End-to-End Modeling for Word-Level Lip Reading.
IEEE Trans. Multim., 2024

PE-Wav2vec: A Prosody-Enhanced Speech Model for Self-Supervised Prosody Learning in TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions.
CoRR, 2024

The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge.
CoRR, 2024

Voice Attribute Editing with Text Prompt.
CoRR, 2024

Efficient Reproduction of Fault-Induced Failures in Distributed Systems with Feedback-Driven Fault Injection.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

Efficient Exposure of Partial Failure Bugs in Distributed Systems with Inferred Abstract States.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

HHD-GP: Incorporating Helmholtz-Hodge Decomposition into Gaussian Processes for Learning Dynamical Systems.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Layer-Adaptive Low-Rank Adaptation of Large ASR Model for Low-Resource Multilingual Scenarios.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2024

Implicit Enhancement of Target Speaker in Speaker-Adaptive ASR through Efficient Joint Optimization.
Proceedings of the IEEE International Conference on Acoustics, 2024

A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024

NAMER: Non-autoregressive Modeling for Handwritten Mathematical Expression Recognition.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
CoRR, 2023

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge.
CoRR, 2023

A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics.
CoRR, 2023

Speech Synthesis with Self-Supervisedly Learnt Prosodic Representations.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Frame-Level Embedding Learning for Few-shot Bioacoustic Event Detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Self-Supervised Audio-Visual Speech Representations Learning by Multimodal Self-Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing audio-visual Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Loss Function Design for DNN-Based Sound Event Localization and Detection on Low-Resource Realistic Data.
Proceedings of the IEEE International Conference on Acoustics, 2023

Reducing the GAP Between Streaming and Non-Streaming Transducer-Based ASR by Adaptive Two-Stage Knowledge Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023

An Experimental Study on Sound Event Localization and Detection Under Realistic Testing Conditions.
Proceedings of the IEEE International Conference on Acoustics, 2023

Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

Progressive Multi-scale Self-supervised Learning for Speech Recognition.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Improved Data2vec with Soft Supervised Hidden Unit for Mandarin Speech Recognition.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Processing Method of Civil Radar Echo Signal Based on Kalman Filter Algorithm.
Proceedings of the Advanced Hybrid Information Processing, 2023

Acquisition Method of Direct Sequence Spread Spectrum Signal Based on Deep Residual Network.
Proceedings of the Advanced Hybrid Information Processing, 2023

2022
Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit.
CoRR, 2022

Progressive Multi-Scale Self-Supervised Learning for Speech Recognition.
CoRR, 2022

Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information.
CoRR, 2022

Multi-Task Joint Learning for Embedding Aware Audio-Visual Speech Enhancement.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Is Lip Region-of-Interest Sufficient for Lipreading?
Proceedings of the International Conference on Multimodal Interaction, 2022

The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
USTC-NELSLIP System Description for DIHARD-III Challenge.
CoRR, 2021

A Model Ensemble Approach for Sound Event Localization and Detection.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

2020
Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Speaker Code Based Speaker Adaptive Training Using Model Agnostic Meta-Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speaker Adaptive Training for Speech Recognition Based on Attention-Over-Attention Mechanism.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

High-Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Local Kernel Distance-Support Vector Data Description (LKD-SVDD)-based Process Monitoring Method for Multiphase Batch Processes.
Proceedings of the 15th IEEE International Conference on Control and Automation, 2019

A Two-stage Single-channel Speaker-dependent Speech Separation Approach for Chime-5 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Online Speaker Adaptation for LVCSR Based on Attention Mechanism.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2015
Evaluation of objective measures applied on the noise suppressed speech signals with Chinese content.
Proceedings of the IEEE International Conference on Information and Automation, 2015

2013
The Analysis on the Assimilation of the Model of Corporate Governance.
Int. J. Asian Bus. Inf. Manag., 2013

2012
Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

2008
Medical Image Categorization Based on Gaussian Mixture Model.
Proceedings of the 2008 International Conference on BioMedical Engineering and Informatics, 2008


  Loading...