Zhijie Yan

According to our database1, Zhijie Yan authored at least 68 papers between 2006 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes.
CoRR, 2024

Large Language Models Powered Context-aware Motion Prediction.
CoRR, 2024

2023
Advancing VAD Systems Based on Multi-Task Learning with Improved Model Structures.
CoRR, 2023

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models.
CoRR, 2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.
CoRR, 2023

Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model.
CoRR, 2023

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

INT2: Interactive Trajectory Prediction at Intersections.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MUG: A General Meeting Understanding and Generation Benchmark.
Proceedings of the IEEE International Conference on Acoustics, 2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
Proceedings of the IEEE International Conference on Acoustics, 2023

Long-Term Interactive Driving Simulation: MPC to the Rescue.
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

M<sup>2</sup>Sim: A Long-Term Interactive Driving Simulator.
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

Exploiting Patent Documents for Cross-Domain Knowledge Transfer in Innovative Engineering Design: A Doc2Vec-GAT-Based Approach.
Proceedings of the 19th IEEE International Conference on Automation Science and Engineering, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios.
CoRR, 2022

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Surface Defect Detection and Classification Based on Fusing Multiple Computer Vision Techniques.
Proceedings of the Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
BeamTransformer: Microphone Array-based Overlapping Speech Detection.
CoRR, 2021

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Real-Time Speaker Diarization System Based on Spatial Spectrum.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
Dynamic Thermal Rating of Transmission Line Based on Environmental Parameter Estimation.
J. Inf. Process. Syst., 2019

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition.
CoRR, 2019

Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018
Uncertainty analysis of dynamic thermal rating based on environmental parameter estimation.
EURASIP J. Wirel. Commun. Netw., 2018

A Study on Improving Acoustic Model for Robust and Far-Field Speech Recognition.
Proceedings of the 23rd IEEE International Conference on Digital Signal Processing, 2018

Deep-FSMN for Large Vocabulary Continuous Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Linear Networks Based Speaker Adaptation for Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Feed-Forward Sequential Memory Networks for Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Analysis on Ampacity of Overhead Transmission Lines Being Operated.
J. Inf. Process. Syst., 2017

Improving latency-controlled BLSTM acoustic models for online speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
Rapid speaker adaptation based on D-code extracted from BLSTM-RNN in LVCSR.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Unsupervised speaker adaptation of BLSTM-RNN for LVCSR based on speaker code.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

2015
Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A context-sensitive-chunk BPTT approach to training deep LSTM/BLSTM recurrent neural networks for offline handwriting recognition.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

2014
An Unsupervised Adaptation Approach to Leveraging Feedback Loop Data by Using i-Vector for Data Clustering and Selection.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

2013
A Unified Trajectory Tiling Approach to High Quality Speech Rendering.
IEEE Trans. Speech Audio Process., 2013

A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Tied-state based discriminative training of context-expanded region-dependent feature transforms for LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Tip tap tones: mobile microtraining of mandarin sounds.
Proceedings of the Mobile HCI '12, 2012

A feature-transform based approach to unsupervised task adaptation and personalization.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A comparative study of fMPE and RDLT approaches to LVCSR.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A study of discriminative feature extraction for i-vector based acoustic sniffing in IVN acoustic model training.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
A new i-vector approach and its application to irrelevant variability normalization based acoustic model training.
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

An i-vector Based Approach to Training Data Clustering for Improved Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

An i-vector Based Approach to Acoustic Sniffing for Irrelevant Variability Normalization Based Acoustic Model Training and Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A study of an irrelevant variability normalization based discriminative training approach for LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2011

Speaker characterization using spectral subband energy ratio based on Harmonic plus Noise Model.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
An HMM trajectory tiling (HTT) approach to high quality TTS.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A perceptual study of acceleration parameters in HMM-based TTS.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Cross-validation based decision tree clustering for HMM-based TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010

Improved modeling for F0 generation and V/U decision in HMM-based TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010

RIch-context Unit Selection (RUS) approach to high quality TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Rich context modeling for high quality HMM-based TTS.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A trust region based optimization for maximum mutual information estimation of HMMS in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Investigation on Adaptation Using Different Discriminative Training Criteria Based Linear Regression and Map.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Soft margin estimation with various separation levels for LVCSR.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Minimum word classification error training of HMMS for automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Word Graph Based Feature Enhancement for Noisy Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

A study on soft margin estimation for LVCSR.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
Signal Trajectory Based Noise Compensation for Robust Speech Recognition.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006


  Loading...