Zhijie Yan

According to our database1, Zhijie Yan authored at least 41 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes.
CoRR, 2024

Large Language Models Powered Context-aware Motion Prediction.
CoRR, 2024

2023
Advancing VAD Systems Based on Multi-Task Learning with Improved Model Structures.
CoRR, 2023

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models.
CoRR, 2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.
CoRR, 2023

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System.
CoRR, 2023

Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model.
CoRR, 2023

INT2: Interactive Trajectory Prediction at Intersections.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MUG: A General Meeting Understanding and Generation Benchmark.
Proceedings of the IEEE International Conference on Acoustics, 2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
Proceedings of the IEEE International Conference on Acoustics, 2023

Long-Term Interactive Driving Simulation: MPC to the Rescue.
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

M<sup>2</sup>Sim: A Long-Term Interactive Driving Simulator.
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

Exploiting Patent Documents for Cross-Domain Knowledge Transfer in Innovative Engineering Design: A Doc2Vec-GAT-Based Approach.
Proceedings of the 19th IEEE International Conference on Automation Science and Engineering, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition.
CoRR, 2022

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis.
CoRR, 2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios.
CoRR, 2022

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.
Proceedings of the Interspeech 2022, 2022

Surface Defect Detection and Classification Based on Fusing Multiple Computer Vision Techniques.
Proceedings of the Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence, 2022

Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
BeamTransformer: Microphone Array-based Overlapping Speech Detection.
CoRR, 2021

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

A Real-Time Speaker Diarization System Based on Spatial Spectrum.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System.
Proceedings of the Interspeech 2020, 2020

2019
Dynamic Thermal Rating of Transmission Line Based on Environmental Parameter Estimation.
J. Inf. Process. Syst., 2019

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition.
CoRR, 2019

Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition.
Proceedings of the Interspeech 2019, 2019

2018
Uncertainty analysis of dynamic thermal rating based on environmental parameter estimation.
EURASIP J. Wirel. Commun. Netw., 2018

A Study on Improving Acoustic Model for Robust and Far-Field Speech Recognition.
Proceedings of the 23rd IEEE International Conference on Digital Signal Processing, 2018

Deep-FSMN for Large Vocabulary Continuous Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Linear Networks Based Speaker Adaptation for Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Feed-Forward Sequential Memory Networks for Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Analysis on Ampacity of Overhead Transmission Lines Being Operated.
J. Inf. Process. Syst., 2017

Improving latency-controlled BLSTM acoustic models for online speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
Rapid speaker adaptation based on D-code extracted from BLSTM-RNN in LVCSR.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Unsupervised speaker adaptation of BLSTM-RNN for LVCSR based on speaker code.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

2012
Tip tap tones: mobile microtraining of mandarin sounds.
Proceedings of the Mobile HCI '12, 2012

2010
Improved modeling for F0 generation and V/U decision in HMM-based TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010


  Loading...