Zhijie Yan

CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Large Language Models Powered Context-aware Motion Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ManufVisSGG: A Vision-Language-Model Approach for Cognitive Scene Graph Generation in Manufacturing Systems.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on Automation Science and Engineering, 2024

Large Language Model for Humanoid Cognition in Proactive Human-Robot Collaboration.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on Automation Science and Engineering, 2024

2023

Advancing VAD Systems Based on Multi-Task Learning with Improved Model Structures.

[BibT_eX]

[DOI]

CoRR, 2023

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.

[BibT_eX]

[DOI]

CoRR, 2023

Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model.

[BibT_eX]

[DOI]

CoRR, 2023

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

INT2: Interactive Trajectory Prediction at Intersections.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MUG: A General Meeting Understanding and Generation Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Long-Term Interactive Driving Simulation: MPC to the Rescue.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

M<sup>2</sup>Sim: A Long-Term Interactive Driving Simulator.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

Exploiting Patent Documents for Cross-Domain Knowledge Transfer in Innovative Engineering Design: A Doc2Vec-GAT-Based Approach.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on Automation Science and Engineering, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios.

[BibT_eX]

[DOI]

CoRR, 2022

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Surface Defect Detection and Classification Based on Fusing Multiple Computer Vision Techniques.

[BibT_eX]

[DOI]

Proceedings of the Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

BeamTransformer: Microphone Array-based Overlapping Speech Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Real-Time Speaker Diarization System Based on Spatial Spectrum.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

Dynamic Thermal Rating of Transmission Line Based on Environmental Parameter Estimation.

[BibT_eX]

[DOI]

J. Inf. Process. Syst., 2019

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Shiliang Zhang

Ming Lei

CoRR, 2019

Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition.

[BibT_eX]

[DOI]

Shiliang Zhang

Ming Lei

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Uncertainty analysis of dynamic thermal rating based on environmental parameter estimation.

[BibT_eX]

[DOI]

EURASIP J. Wirel. Commun. Netw., 2018

A Study on Improving Acoustic Model for Robust and Far-Field Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Conference on Digital Signal Processing, 2018

Deep-FSMN for Large Vocabulary Continuous Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Linear Networks Based Speaker Adaptation for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Feed-Forward Sequential Memory Networks for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Analysis on Ampacity of Overhead Transmission Lines Being Operated.

[BibT_eX]

[DOI]

Yanling Wang

Likai Liang

J. Inf. Process. Syst., 2017

Improving latency-controlled BLSTM acoustic models for online speech recognition.

[BibT_eX]

[DOI]

Shaofei Xue

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Rapid speaker adaptation based on D-code extracted from BLSTM-RNN in LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Unsupervised speaker adaptation of BLSTM-RNN for LVCSR based on speaker code.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

2015

Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach.

[BibT_eX]

[DOI]

Kai Chen

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A context-sensitive-chunk BPTT approach to training deep LSTM/BLSTM recurrent neural networks for offline handwriting recognition.

[BibT_eX]

[DOI]

Kai Chen

Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

2014

An Unsupervised Adaptation Approach to Leveraging Feedback Loop Data by Using i-Vector for Data Clustering and Selection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2014

2013

A Unified Trajectory Tiling Approach to High Quality Speech Rendering.

[BibT_eX]

[DOI]

Yao Qian

IEEE Trans. Speech Audio Process., 2013

A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Tied-state based discriminative training of context-expanded region-dependent feature transforms for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Tip tap tones: mobile microtraining of mandarin sounds.

[BibT_eX]

[DOI]

Proceedings of the Mobile HCI '12, 2012

A feature-transform based approach to unsupervised task adaptation and personalization.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A comparative study of fMPE and RDLT approaches to LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A study of discriminative feature extraction for i-vector based acoustic sniffing in IVN acoustic model training.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

A new i-vector approach and its application to irrelevant variability normalization based acoustic model training.

[BibT_eX]

[DOI]

Yu Zhang

Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

An i-vector Based Approach to Training Data Clustering for Improved Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

An i-vector Based Approach to Acoustic Sniffing for Irrelevant Variability Normalization Based Acoustic Model Training and Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A study of an irrelevant variability normalization based discriminative training approach for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Speaker characterization using spectral subband energy ratio based on Harmonic plus Noise Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

An HMM trajectory tiling (HTT) approach to high quality TTS.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A perceptual study of acceleration parameters in HMM-based TTS.

[BibT_eX]

[DOI]

Yining Chen

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Cross-validation based decision tree clustering for HMM-based TTS.

[BibT_eX]

[DOI]

Yu Zhang

Proceedings of the IEEE International Conference on Acoustics, 2010

Improved modeling for F0 generation and V/U decision in HMM-based TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

RIch-context Unit Selection (RUS) approach to high quality TTS.

[BibT_eX]

[DOI]

Yao Qian

Proceedings of the IEEE International Conference on Acoustics, 2010

An HMM Trajectory Tiling (HTT) Approach to High Quality TTS - Microsoft Entry to Blizzard Challenge 2010.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

2009

Rich context modeling for high quality HMM-based TTS.

[BibT_eX]

[DOI]

Yao Qian

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A trust region based optimization for maximum mutual information estimation of HMMS in speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

Investigation on Adaptation Using Different Discriminative Training Criteria Based Linear Regression and Map.

[BibT_eX]

[DOI]

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Soft margin estimation with various separation levels for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Minimum word classification error training of HMMS for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Word Graph Based Feature Enhancement for Noisy Speech Recognition.

[BibT_eX]

[DOI]