Siqi Zheng

This page is a disambiguation page, it actually contains multiple papers from persons of the same or a similar name.

Bibliography

2026

X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs.

[BibT_eX]

[DOI]

CoRR, March, 2026

BCUA: A UAV Group Authentication Protocol Based on the CVMerkle Tree Structure.

[BibT_eX]

[DOI]

IEEE Trans. Veh. Technol., February, 2026

Reconstructing the Subterranean Canvas: Digital Re-Contextualization of the Dingjiazha M5 Muraled Tomb in Jiuquan.

[BibT_eX]

[DOI]

ISPRS Int. J. Geo Inf., 2026

Blockchain for message dissemination in VANETs based on approval voting.

[BibT_eX]

[DOI]

Alwyn Jakobus Hoffman

Ad Hoc Networks, 2026

LSGRS: A geolocation and reputation-aware dynamic dual-layer sharding scheme for scalable vehicular blockchain networks.

[BibT_eX]

[DOI]

Ad Hoc Networks, 2026

BernO: A Breath-Driven Odor Display for Spatial Olfactory Interaction in VR.

[BibT_eX]

[DOI]

Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, 2026

PlayScent: Exploring Olfactory Texture Across Scent Delivery Methods.

[BibT_eX]

[DOI]

Proceedings of the 2026 Designing Interactive Systems Conference, 2026

2025

Identifying flood-prone zones and their geographic drivers in Northwest China under a changing climate.

[BibT_eX]

[DOI]

Int. J. Digit. Earth, December, 2025

AuthGlass: Enhancing Voice Authentication on Smart Glasses via Air-Bone Acoustic Features.

[BibT_eX]

[DOI]

CoRR, September, 2025

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators.

[BibT_eX]

[DOI]

CoRR, May, 2025

Trailer-referenced autonomous navigation of agricultural tractor-trailer systems.

[BibT_eX]

[DOI]

Siqi Zheng

Shengli Xu

Rahul Rai

Comput. Electron. Agric., 2025

Dense object detection based canopy characteristics encoding for precise spraying in peach orchards.

[BibT_eX]

[DOI]

Shengli Xu

Siqi Zheng

Rahul Rai

Comput. Electron. Agric., 2025

Understanding Users' Perceptions and Expectations toward a Social Balloon Robot via an Exploratory Study.

[BibT_eX]

[DOI]

Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, 2025

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Exploring Text-Queried Sound Event Detection with Audio Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization on Multi-party Conversation.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language.

[BibT_eX]

[DOI]

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., March, 2024

Intercity connectivity and urban innovation.

[BibT_eX]

[DOI]

Xiaofan Liang

César A. Hidalgo

Pierre-Alexandre Balland

Siqi Zheng

Jianghao Wang

Comput. Environ. Urban Syst., 2024

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup.

[BibT_eX]

[DOI]

CoRR, 2024

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation.

[BibT_eX]

[DOI]

CoRR, 2024

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization.

[BibT_eX]

[DOI]

CoRR, 2024

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization.

[BibT_eX]

[DOI]

CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Accompanied Singing Voice Synthesis with Fully Text-controlled Melody.

[BibT_eX]

[DOI]

CoRR, 2024

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.

[BibT_eX]

[DOI]

CoRR, 2024

AudioLCM: Text-to-Audio Generation with Latent Consistency Models.

[BibT_eX]

[DOI]

CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.

[BibT_eX]

[DOI]

CoRR, 2024

Extending Multi-modal Contrastive Representations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

FunCodec: A Fundamental, Reproducible and Integrable Open-Source Toolkit for Neural Speech Codec.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Loss Masking Is Not Needed In Decoder-Only Transformer For Discrete-Token-Based ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

PepperPose: Full-Body Pose Estimation with a Companion Robot.

[BibT_eX]

[DOI]

Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024

2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation.

[BibT_eX]

[DOI]

CoRR, 2023

Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision.

[BibT_eX]

[DOI]

Yafeng Chen

Siqi Zheng

Qian Chen

CoRR, 2023

Improving BERT with Hybrid Pooling Network and Drop Mask.

[BibT_eX]

[DOI]

CoRR, 2023

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement.

[BibT_eX]

[DOI]

CoRR, 2023

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Pushing the Limits of Self-Supervised Speaker Verification using Regularized Distillation Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

A Two-Layer Human-in-the-Loop Optimization Framework for Customizing Lower-Limb Exoskeleton Assistance.

[BibT_eX]

[DOI]

Siqi Zheng

Ge Lv

Proceedings of the American Control Conference, 2023

DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Multi-Source Time Series Remote Sensing Feature Selection and Urban Forest Extraction Based on Improved Artificial Bee Colony.

[BibT_eX]

[DOI]

Remote. Sens., 2022

Contextual Expressive Text-to-Speech.

[BibT_eX]

[DOI]

CoRR, 2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios.

[BibT_eX]

[DOI]

CoRR, 2022

Deep Representation Decomposition for Rate-Invariant Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification.

[BibT_eX]

[DOI]

Siqi Zheng

Hongbin Suo

Qian Chen

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Label-Dividing Gated Graph Neural Network for Hierarchical Text Classification.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2022

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Reformulating Speaker Diarization As Community Detection With Emphasis On Topological Structure.

[BibT_eX]

[DOI]

Siqi Zheng

Hongbin Suo

Proceedings of the IEEE International Conference on Acoustics, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information.

[BibT_eX]

[DOI]

CoRR, 2021

BeamTransformer: Microphone Array-based Overlapping Speech Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Measuring daily-life fear perception change: a computational study in the context of COVID-19.

[BibT_eX]

[DOI]

CoRR, 2021

Estimating air quality co-benefits of energy transition using machine learning.

[BibT_eX]

[DOI]

CoRR, 2021

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Real-Time Speaker Diarization System Based on Spatial Spectrum.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Cam: Context-Aware Masking for Robust Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Phonetically-Aware Coupled Network For Short Duration Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Siqi Zheng

Yun Lei

Hongbin Suo

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

Time-resolved protein activation by proximal decaging in living systems.

[BibT_eX]

[DOI]

Nat., 2019

Autoencoder-Based Semi-Supervised Curriculum Learning for Out-of-Domain Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Towards a Fault-Tolerant Speaker Verification System: A Regularization Approach to Reduce the Condition Number.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Factors Influencing University Students' Intention to Redeem Digital Takeaway Coupons - Analysis Based on A Survey in China.

[BibT_eX]

[DOI]

Guihang Guo

Ying Li

Siqi Zheng

Proceedings of the ICIT 2019, 2019

2018

A Noise-Robust Self-Adaptive Multitarget Speaker Detection System.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Pattern Recognition, 2018

Siqi Zheng

Bibliography

Loading...