Ziqiang Shi

Neural Networks, 2026

Scalpel: Fine-Grained Alignment of Attention Activation Manifolds via Mixture Gaussian Bridges to Mitigate Multimodal Hallucination.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

2025

Bayesian Optimal Latent Projection for Noisy Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

AugCount: Test-Time Semantic Augmentation via Diffusion for General Open-World Object Counting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2025

Selective-SAM: Memory Optimization for Segment Anything Model 2 with Application in Self-Checkout Product Counting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2025

TrueCount: Improving Open-World Object Counting with Visual-Language Models and Dynamic Multi-Modal Inputs.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Attribute Conditional Diffusion-Augmented Person Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

RealSinger: Ultra-realistic singing voice generation via stochastic differential equations.

[BibT_eX]

[DOI]

Neurocomputing, 2024

Generative Modelling with High-Order Langevin Dynamics.

[BibT_eX]

[DOI]

CoRR, 2024

Conditional Velocity Score Estimation for Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Self-Checkout Product Detection with Occlusion Layer Prediction and Intersection Weighting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2024

Multimedia Generative Modelling with High-Order Langevin Dynamics.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Project, Skate, and Refresh: Improved Schrödinger Bridge Sampler for Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2024

Langwave: Realistic Voice Generation Based on High-Order Langevin Dynamics.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Noisy Image Restoration Based on Conditional Acceleration Score Approximation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

SchröWave: Realistic voice generation by solving two-stage conditional Schrödinger bridge problems.

[BibT_eX]

[DOI]

Digit. Signal Process., September, 2023

Semi-Supervised Contrastive Learning with Soft Mask Attention for Facial Action Unit Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

CheckSORT: Refined Synthetic Data Combination and Optimized SORT for Automatic Retail Checkout.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

ITÔN: End-to-end audio generation with Itô stochastic differential equations.

[BibT_eX]

[DOI]

Digit. Signal Process., 2022

ItôWave: Itô Stochastic Differential Equation is all You Need for Wave Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Multi-modal Affect Analysis using standardized data within subjects in the Wild.

[BibT_eX]

[DOI]

CoRR, 2021

It$\hat{\text{o}}$TTS and It$\hat{\text{o}}$Wave: Linear Stochastic Differential Equation Is All You Need For Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2021

2020

Link Prediction Adversarial Attack Via Iterative Gradient Attack.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Soc. Syst., 2020

Pyramidal Temporal Pooling With Discriminative Mapping for Audio Classification.

[BibT_eX]

[DOI]

Liwen Zhang

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Learning Temporal Relations from Semantic Neighbors for Acoustic Scene Classification.

[BibT_eX]

[DOI]

Liwen Zhang

IEEE Signal Process. Lett., 2020

LoRRaL: Facial Action Unit Detection Based on Local Region Relation Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Toward the pre-cocktail party problem with TasTas+.

[BibT_eX]

[DOI]

Anyan Shi

CoRR, 2020

Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training.

[BibT_eX]

[DOI]

CoRR, 2020

La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention.

[BibT_eX]

[DOI]

CoRR, 2020

FurcaNeXt: End-to-End Monaural Speech Separation with Dynamic Gated Dilated Temporal Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 26th International Conference, 2020

ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification.

[BibT_eX]

[DOI]

Liwen Zhang

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training.

[BibT_eX]

[DOI]

Liu Liu

Proceedings of the 28th European Signal Processing Conference, 2020

2019

Learning from Adversarial Features for Few-Shot Classification.

[BibT_eX]

[DOI]

Wei Shen

Jun Sun

CoRR, 2019

FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks.

[BibT_eX]

[DOI]

CoRR, 2019

FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation.

[BibT_eX]

[DOI]

CoRR, 2019

Is CQT more suitable for monaural speech separation than STFT? an empirical study.

[BibT_eX]

[DOI]

CoRR, 2019

Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Robustness Evaluation of Deep Learning Models Based on Local Prediction Consistency.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference On Machine Learning And Applications, 2019

Furcax: End-to-end Monaural Speech Separation Based on Deep Gated (De)convolutional Neural Networks with Adversarial Example Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events 2019 (DCASE 2019), 2019

2018

Link Prediction Adversarial Attack.

[BibT_eX]

[DOI]

CoRR, 2018

A Double Joint Bayesian Approach for J-Vector Based Text-dependent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Latent Factor Analysis of Deep Bottleneck Features for Speaker Verification with Random Digit Strings.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Joint Learning of J-Vector Extractor and Joint Bayesian Model for Text Dependent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Multi-view Probability Linear Discrimination Analysis for Multi-view Vector Based Text Dependent Speaker Verification.

[BibT_eX]

[DOI]

Liu Liu

CoRR, 2017

Better Worst-Case Complexity Analysis of the Block Coordinate Descent Method for Large Scale Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 16th IEEE International Conference on Machine Learning and Applications, 2017

Multi-view (Joint) probability linear discrimination analysis for J-vector based text dependent speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

Empirical study of PROXTONE and PROXTONE$^+$ for Fast Learning of Large Scale Sparse Models.

[BibT_eX]

[DOI]

CoRR, 2016

2015

Soft Margin Based Low-Rank Audio Signal Classification.

[BibT_eX]

[DOI]

Neural Process. Lett., 2015

Large Scale Optimization with Proximal Stochastic Newton-Type Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2015

Online and Stochastic Universal Gradient Methods for Minimizing Regularized Hölder Continuous Finite Sums in Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Knowledge Discovery and Data Mining, 2015

2013

Audio classification with low-rank matrix representation features.

[BibT_eX]

[DOI]

ACM Trans. Intell. Syst. Technol., 2013

Identification of Objectionable Audio Segments Based on Pseudo and Heterogeneous Mixture Models.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Audio Segment Classification Using Online Learning Based Tensor Representation Feature Discrimination.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Online Douglas-Rachford splitting method.

[BibT_eX]

[DOI]

CoRR, 2013

Fudan at MediaEval 2013: Violent Scenes Detection Using Motion Features and Part-Level Attributes.

[BibT_eX]

[DOI]

Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

Guarantees of Augmented Trace Norm Models in Tensor Recovery.

[BibT_eX]

[DOI]

Proceedings of the IJCAI 2013, 2013

2012

Identifiability of multivariate logistic mixture models

[BibT_eX]

[DOI]

CoRR, 2012

Guarantees of Augmented Trace Norm Models in Tensor Recovery

[BibT_eX]

[DOI]

CoRR, 2012

Low-rank Audio Signal Classification Under Soft Margin and Trace Norm Constraints.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

Online Learning for Classification of Low-rank Representation Features and Its Applications in Audio Segment Classification

[BibT_eX]

[DOI]

CoRR, 2011

Trace Norm Regularized Tensor Classification and Its Online Learning Approaches

[BibT_eX]

[DOI]

CoRR, 2011

Heterogeneous mixture models using sparse representation features for applause and laugh detection.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

Real-World Speech/Non-Speech Audio Classification Based on Sparse Representation Features and GPCs.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Novel Framework Based on Trace Norm Minimization for Audio Event Detection.

[BibT_eX]

[DOI]