We stand with Ukraine

We stand with Ukraine

Jiaen Liang

Orcid: 0009-0001-8309-1301

According to our database¹, Jiaen Liang authored at least 60 papers between 2006 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning.

[DOI]

,

,

,

,

,

,

,

,

CoRR, May, 2026

LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2026

Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition.

[DOI]

,

,

,

,

CoRR, March, 2026

Anchoring Emotions in Text: Robust Multimodal Fusion for Mimicry Intensity Estimation.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, March, 2026

Solution to the 10th ABAW Expression Recognition Challenge: A Robust Multimodal Framework with Safe Cross-Attention and Modality Dropout.

[DOI]

,

,

,

,

,

,

,

CoRR, March, 2026

PARL: Position-Aware Relation Learning Network for Document Layout Analysis.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, January, 2026

Semi-Supervised Facial Expression Recognition based on Dynamic Threshold and Negative Learning.

[DOI]

,

,

,

,

,

CoRR, January, 2026

Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR.

[DOI]

,

,

,

CoRR, January, 2026

MetaDB: Metadata-Guided Diffusion Bridge Model for High-Fidelity Medical Image Synthesis.

[DOI]

,

,

,

,

,

Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

FocalOrder: Focal Preference Optimization for Reading Order Detection.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

Heterogeneous Encoder Fusion with KAN Decoder for Group Engagement Modeling via 8× Sliding Pipelines.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Optimization of Multimodal Inputs Based on Diffusion Models: Zero-Shot Semantic Image Generation.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

2024

RAG-Guided Large Language Models for Visual Spatial Description with Adaptive Hallucination Corrector.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Temporal-Informative Adapters in VideoMAE V2 and Multi-Scale Feature Fusion for Micro-Expression Spotting-then-Recognize.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

End-to-end Spatio-Temporal Information Aggregation For Micro-Action Detection.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Building Robust Video-Level Deepfake Detection via Audio-Visual Local-Global Interactions.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Micro-Expression Spotting Based on Optical Flow Feature with Boundary Calibration.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Reducing Speech Distortion and Artifacts for Speech Enhancement by Loss Function.

[DOI]

,

,

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Multi Model Ensemble for Compound Expression Recognition.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AUD-TGN: Advancing Action Unit Detection with Temporal Convolution and GPT-2 in Wild Audiovisual Contexts.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Improving Valence-Arousal Estimation with Spatiotemporal Relationship Learning and Multimodal Fusion.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Exploring Facial Expression Recognition through Semi-Supervised Pre-training and Temporal Modeling.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Dual-model self-regularization and fusion for domain adaptation of robust speaker verification.

[DOI]

,

,

Speech Commun., November, 2023

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis.

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

MMT-GD: Multi-Modal Transformer with Graph Distillation for Cross-Cultural Humor Detection.

[DOI]

,

,

,

,

,

Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Sliding Window Seq2seq Modeling for Engagement Estimation.

[DOI]

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Answer-Based Entity Extraction and Alignment for Visual Text Question Answering.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

M<sup>2</sup>-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Acoustic domain mismatch compensation in bird audio detection.

[DOI]

,

,

,

Int. J. Speech Technol., 2022

Exploring single channel speech separation for short-time text-dependent speaker verification.

[DOI]

,

,

,

Int. J. Speech Technol., 2022

Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection.

[DOI]

,

,

,

,

Digit. Signal Process., 2022

ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis.

[DOI]

,

,

,

,

,

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection.

[DOI]

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Joint Weakly Supervised AT and AED Using Deep Feature Distillation and Adaptive Focal Loss.

[DOI]

,

,

,

CoRR, 2021

Attention-Based Scaling Adaptation for Target Speech Extraction.

[DOI]

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

CNN-based Discriminative Training for Domain Compensation in Acoustic Event Detection with Frame-wise Classifier.

[DOI]

,

,

,

,

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Mask-based blind source separation and MVDR beamforming in ASR.

[DOI]

,

,

,

Int. J. Speech Technol., 2020

Attention-based scaling adaptation for target speech extraction.

[DOI]

,

,

CoRR, 2020

Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-Based LVCSR.

[DOI]

,

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speech Driven Talking Head Generation via Attentional Landmarks Based Representation.

[DOI]

,

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The SHNU System for Blizzard Challenge 2020.

[DOI]

,

,

,

,

,

,

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019

Speaker Direction-of-Arrival Estimation Based on Orthogonal Dipoles.

[DOI]

,

,

Zhaoqiong Huang

,

,

,

,

Circuits Syst. Signal Process., 2019

2018

Active Learning for LF-MMI Trained Neural Networks in ASR.

[DOI]

,

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern.

[DOI]

,

,

,

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Frequency-invariant differential microphone array design in the STFT domain.

[DOI]

,

,

,

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2011

Exploring nuisance attribute projection and score normalization for GLDS-SVM based automatic mispronunciation detection method.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Exploring goodness of prosody by diverse matching templates.

[DOI]

,

,

,

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Automatic reference independent evaluation of prosody quality using multiple knowledge fusions.

[DOI]

,

,

,

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009

High performance automatic mispronunciation detection method based on neural network and TRAP features.

[DOI]

,

,

,

,

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

An efficient mispronounciation detction method using GLDS-SVM and formant enhanced features.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2009

Context Dependent Feature Based Bottom-up Rescoring SVM Classifier in Children's English Stress Mis-pronunciation Detection.

[DOI]

,

,

,

,

Proceedings of the 9th IEEE International Conference on Advanced Learning Technologies, 2009

2008

Improving searching speed and accuracy of query by humming system based on three methods: feature fusion, candidates set reduction and multiple similarity measurement rescoring.

[DOI]

,

,

,

,

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Music Genre Classification Based on Multiple Classifier Fusion.

[DOI]

,

,

,

,

Proceedings of the Fourth International Conference on Natural Computation, 2008

Improved phonotactic language identification using random forest language models.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

A Novel Phone-State Matrix Based Vocabulary-Indenendent Keyword Spotting Method for Spontaneous Speech.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Full Utilization of Closed-captions in Broadcast News Recognition.

[DOI]

,

,

,

,

Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

An Improved Mandarin Keyword Spotting System Using MCE Training and Context-Enhanced Verification.

[DOI]

,

,

,

,

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Loading...