Kai Yu

Orcid: 0000-0002-7102-9826

Affiliations:
  • Shanghai Jiao Tong University, Computer Science and Engineering Department, China
  • Cambridge University, Engineering Department, UK (PhD 2006)


According to our database1, Kai Yu authored at least 277 papers between 2004 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback.
CoRR, 2024

A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds.
CoRR, 2024

ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary.
CoRR, 2024

Enhancing Audio Generation Diversity with Visual Information.
CoRR, 2024

A BiRGAT Model for Multi-intent Spoken Language Understanding with Hierarchical Semantic Frames.
CoRR, 2024

Is Cognition and Action Consistent or Not: Investigating Large Language Model's Personality.
CoRR, 2024

MULTI: Multimodal Understanding Leaderboard with Text and Images.
CoRR, 2024

ChemDFM: Dialogue Foundation Model for Chemistry.
CoRR, 2024

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.
CoRR, 2024

Towards Weakly Supervised Text-to-Audio Grounding.
CoRR, 2024

Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue.
Trans. Assoc. Comput. Linguistics, 2023

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention.
CoRR, 2023

DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder.
CoRR, 2023

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations.
CoRR, 2023

ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL.
CoRR, 2023

Acoustic BPE for Speech Generation with Discrete Tokens.
CoRR, 2023

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
CoRR, 2023

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching.
CoRR, 2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech.
CoRR, 2023

Improving Audio Caption Fluency with Automatic Error Correction.
CoRR, 2023

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation.
CoRR, 2023

Large Language Model Is Semi-Parametric Reinforcement Learning Agent.
CoRR, 2023

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
CoRR, 2023

Enhance Temporal Relations in Audio Captioning with Sound Event Detection.
CoRR, 2023

Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction.
CoRR, 2023

Large Language Models Are Semi-Parametric Reinforcement Learning Agents.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

DiffVoice: Text-to-Speech with Latent Diffusion.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

Diverse and Vivid Sound Generation from Text Descriptions.
Proceedings of the IEEE International Conference on Acoustics, 2023

Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2023

ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Exploring Schema Generalizability of Text-to-SQL.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

SPM: A Split-Parsing Method for Joint Multi-Intent Detection and Slot Filling.
Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, 2023

TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

2022
Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Neural Fusion for Voice Cloning.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Data augmentation based non-parallel voice conversion with frame-level speaker disentangler.
Speech Commun., 2022

DialogZoo: Large-Scale Dialog-Oriented Task Learning.
CoRR, 2022

A Comprehensive Survey of Automated Audio Captioning.
CoRR, 2022

The AISP-SJTU Translation System for WMT 2022.
Proceedings of the Seventh Conference on Machine Translation, 2022

UniDU: Towards A Unified Generative Dialogue Understanding Framework.
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022

TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

The AISP-SJTU Simultaneous Translation System for IWSLT 2022.
Proceedings of the 19th International Conference on Spoken Language Translation, 2022

The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

MSDWild: Multi-modal Speaker Diarization Dataset in the Wild.
Proceedings of the Interspeech 2022, 2022

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature.
Proceedings of the Interspeech 2022, 2022

Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Text Adaptive Detection for Customizable Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2022

Climate and Weather: Inspecting Depression Detection via Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Audio-Text Retrieval in Context.
Proceedings of the IEEE International Conference on Acoustics, 2022

Navigating Audio-Visual Event Detection Across Mismatched Modalities.
Proceedings of the IEEE International Conference on Acoustics, 2022

Category-Adapted Sound Event Enhancement with Weakly Labeled Data.
Proceedings of the IEEE International Conference on Acoustics, 2022

Speech Enhancement with Neural Homomorphic Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

LatticeBART: Lattice-to-Lattice Pre-Training for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

AdapterShare: Task Correlation Modeling with Adapter Differentiation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Towards Duration Robust Weakly Supervised Sound Event Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling.
CoRR, 2021

Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis.
CoRR, 2021

Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF.
Proceedings of the Natural Language Processing and Chinese Computing, 2021

Relation-Aware Multi-hop Reasoning forVisual Dialog.
Proceedings of the Natural Language Processing and Chinese Computing, 2021

ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

DEPA: Self-Supervised Audio Embedding for Depression Detection.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Audio Caption in a Car Setting with a Sentence-Level Loss.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

A Lightweight Framework for Online Voice Activity Detection in the Wild.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Class-Based Neural Network Language Model for Second-Pass Rescoring in ASR.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021

Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.
Proceedings of the IEEE International Conference on Acoustics, 2021

SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

WebSRC: A Dataset for Web-Based Structural Reading Comprehension.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Dual Learning for Semi-Supervised Natural Language Understanding.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Neural Network Language Model Compression With Product Quantization and Soft Binarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Towards a new generation of artificial intelligence in China.
Nat. Mach. Intell., 2020

CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking.
CoRR, 2020

Dual Learning for Dialogue State Tracking.
CoRR, 2020

Structured Hierarchical Dialogue Policy with Graph Neural Networks.
CoRR, 2020

Deep Reinforcement Learning for On-line Dialogue State Tracking.
CoRR, 2020

End-to-End Speaker-Dependent Voice Activity Detection.
CoRR, 2020

Vector Projection Network for Few-shot Slot Tagging in Natural Language Understanding.
CoRR, 2020

An Investigation on Deep Learning with Beta Stabilizer.
CoRR, 2020

GPVAD: Towards noise robust voice activity detection via weakly supervised sound event detection.
CoRR, 2020

An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models.
Proceedings of the Natural Language Processing and Chinese Computing, 2020

Memory Attention Neural Network for Multi-domain Dialogue State Tracking.
Proceedings of the Natural Language Processing and Chinese Computing, 2020

Robust Spoken Language Understanding with RL-Based Value Error Recovery.
Proceedings of the Natural Language Processing and Chinese Computing, 2020

Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection.
Proceedings of the Interspeech 2020, 2020

Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding.
Proceedings of the Interspeech 2020, 2020

Neural Homomorphic Vocoder.
Proceedings of the Interspeech 2020, 2020

Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection.
Proceedings of the Interspeech 2020, 2020

CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Neural Lattice Search for Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Addressing the Polysemy Problem in Language Modeling with Attentional Multi-Sense Embeddings.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Hierarchical Tracker for Multi-Domain Dialogue State Tracking.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Augmentation for Low Resource Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Duration Robust Weakly Supervised Sound Event Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Investigation of Specaugment for Deep Speaker Embedding Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning.
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention Networks.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Neural Graph Matching Networks for Chinese Short Text Matching.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

What does a Car-ssette tape tell?
CoRR, 2019

AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning.
CoRR, 2019

Text-based Depression Detection: What Triggers An Alert.
CoRR, 2019

Duration robust sound event detection.
CoRR, 2019

Cross Aggregation of Multi-head Attention for Neural Machine Translation.
Proceedings of the Natural Language Processing and Chinese Computing, 2019

The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge.
Proceedings of the Interspeech 2019, 2019

Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification.
Proceedings of the Interspeech 2019, 2019

On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction.
Proceedings of the Interspeech 2019, 2019

Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training.
Proceedings of the Interspeech 2019, 2019

Joint Decoding of CTC Based Systems for Speech Recognition.
Proceedings of the Interspeech 2019, 2019

CATSLU: The 1st Chinese Audio-Textual Spoken Language Understanding Challenge.
Proceedings of the International Conference on Multimodal Interaction, 2019

Robust Spoken Language Understanding with Acoustic and Domain Knowledge.
Proceedings of the International Conference on Multimodal Interaction, 2019

A Hierarchical Decoding Model for Spoken Language Understanding from Unaligned Data.
Proceedings of the IEEE International Conference on Acoustics, 2019

Audio Caption: Listen and Tell.
Proceedings of the IEEE International Conference on Acoustics, 2019

Knowledge Distillation for Small Foot-print Deep Speaker Embedding.
Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Monaural Multi-speaker ASR System without Pretraining.
Proceedings of the IEEE International Conference on Acoustics, 2019

Data Augmentation with Atomic Templates for Spoken Language Understanding.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Highly Efficient Neural Network Language Model Compression Using Soft Binarization Training.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Exploring Model Units and Training Strategies for End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Semantic Parsing with Dual Learning.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Rich Short Text Conversation Using Semantic-Key-Controlled Sequence Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Sequence discriminative training for deep learning based acoustic keyword spotting.
Speech Commun., 2018

Concept Transfer Learning for Adaptive Language Understanding.
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Cost-Sensitive Active Learning for Dialogue State Tracking.
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Binarized LSTM Language Model.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Joint Spoken Language Understanding and Domain Adaptive Language Modeling.
Proceedings of the Intelligence Science and Big Data Engineering, 2018

Covariance Based Deep Feature for Text-Dependent Speaker Verification.
Proceedings of the Intelligence Science and Big Data Engineering, 2018

Knowledge Distillation for Sequence Model.
Proceedings of the Interspeech 2018, 2018

Angular Softmax for Short-Duration Text-independent Speaker Verification.
Proceedings of the Interspeech 2018, 2018

MLN: Moment localization Network and Samples Selection for Moment Retrieval.
Proceedings of the 2nd International Conference on Video and Image Processing, 2018

Robust Spoken Language Understanding with Unsupervised ASR-Error Adaptation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker Identification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Semi-Supervised Training Using Adversarial Multi-Task Learning for Spoken Language Understanding.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

On Modular Training of Neural Acoustics-to-Word Model for LVCSR.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Policy Adaptation for Deep Reinforcement Learning-Based Dialogue Management.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Towards Universal Dialogue State Tracking.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Structured Dialogue Policy with Graph Neural Networks.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

2017
Phone Synchronous Speech Recognition With CTC Lattices.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

splab at the NTCIR-13 STC-2 Task.
Proceedings of the 13th NTCIR Conference, 2017

A Unified Confidence Measure Framework Using Auxiliary Normalization Graph.
Proceedings of the Intelligence Science and Big Data Engineering, 2017

Deep Attentive Structured Language Model Based on LSTM.
Proceedings of the Intelligence Science and Big Data Engineering, 2017

Binary Deep Neural Networks for Speech Recognition.
Proceedings of the Interspeech 2017, 2017

What Does the Speaker Embedding Encode?
Proceedings of the Interspeech 2017, 2017

Discrete Duration Model for Speech Synthesis.
Proceedings of the Interspeech 2017, 2017

Small-footprint convolutional neural network for spoofing detection.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

End-to-end spoofing detection with raw waveform CLDNNS.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Confidence measures for CTC-based phone synchronous decoding.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Affordable On-line Dialogue Policy Learning.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

On-line Dialogue Policy Learning with Companion Teaching.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR.
Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2017

Future vector enhanced LSTM language model for LVCSR.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
Cluster Adaptive Training for Deep Neural Network Based Acoustic Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Deep features for automatic spoofing detection.
Speech Commun., 2016

Evolvable dialogue state tracking for statistical dialogue management.
Frontiers Comput. Sci., 2016

The splab at the NTCIR-12 Short Text Conversation Task.
Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, 2016

Multi-task joint-learning for robust voice activity detection.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Directed automatic speech transcription error correction using bidirectional LSTM.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Rich punctuations prediction using large-scale deep learning.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

On training bi-directional neural network language model with noise contrastive estimation.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC.
Proceedings of the Interspeech 2016, 2016

Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues.
Proceedings of the Interspeech 2016, 2016

Phone Synchronous Decoding with CTC Lattice.
Proceedings of the Interspeech 2016, 2016

Discriminatively trained joint speaker and environment representations for adaptation of deep neural network acoustic models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A comparative study of robustness of deep learning approaches for VAD.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Constrained Markov Bayesian Polynomial for Efficient Dialogue State Tracking.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Deep feature for text-dependent speaker verification.
Speech Commun., 2015

Recurrent Polynomial Network for Dialogue State Tracking with Mismatched Semantic Parsers.
Proceedings of the SIGDIAL 2015 Conference, 2015

Paragraph vector based topic model for language model adaptation.
Proceedings of the INTERSPEECH 2015, 2015

Multi-task learning for text-dependent speaker verification.
Proceedings of the INTERSPEECH 2015, 2015

Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge.
Proceedings of the INTERSPEECH 2015, 2015

An investigation of context clustering for statistical speech synthesis with deep neural network.
Proceedings of the INTERSPEECH 2015, 2015

Very deep convolutional neural networks for LVCSR.
Proceedings of the INTERSPEECH 2015, 2015

Automatic model redundancy reduction for fast back-propagation for deep neural networks in speech recognition.
Proceedings of the 2015 International Joint Conference on Neural Networks, 2015

Cluster adaptive training for deep neural network.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Recurrent neural network language model with structured word embeddings for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A novel static parameter calculation method for model compensation.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Local trajectory based speech enhancement for robust speech recognition with deep neural network.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

An investigation on DNN-derived bottleneck features for GMM-HMM based robust speech recognition.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Multi-task joint-learning of deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Semantic parser enhancement for dialogue domain extension with little data.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

A generalized rule based tracker for dialogue state tracking.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

The SJTU System for Dialog State Tracking Challenge 2.
Proceedings of the SIGDIAL 2014 Conference, 2014

Acoustic emotion recognition using deep neural network.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Tandem deep features for text-dependent speaker verification.
Proceedings of the INTERSPEECH 2014, 2014

A novel dynamic parameters calculation approach for model compensation.
Proceedings of the INTERSPEECH 2014, 2014

Speaker verification with deep features.
Proceedings of the 2014 International Joint Conference on Neural Networks, 2014

Reshaping deep neural network for fast decoding by node-pruning.
Proceedings of the IEEE International Conference on Acoustics, 2014

Stochastic data sweeping for fast DNN training.
Proceedings of the IEEE International Conference on Acoustics, 2014

Second order vector taylor series based robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
A New Word Language Model Evaluation Metric for Character Based Languages.
Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2013

Combination of data borrowing strategies for low-resource LVCSR.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
Introduction to the Issue on Advances in Spoken Dialogue Systems and Mobile Interface.
IEEE J. Sel. Top. Signal Process., 2012

Discriminative spoken language understanding using word confusion networks.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

The Effect of Cognitive Load on a Statistical Dialogue System.
Proceedings of the SIGDIAL 2012 Conference, 2012

Development of the 2012 SJTU HVR system.
Proceedings of the International Conference on Multimodal Interaction, 2012

ICMI'12 grand challenge: haptic voice recognition.
Proceedings of the International Conference on Multimodal Interaction, 2012

2011
Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis.
IEEE Trans. Speech Audio Process., 2011

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis.
Speech Commun., 2011

Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results.
Proceedings of the SIGDIAL 2011 Conference, 2011

Real User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk.
Proceedings of the INTERSPEECH 2011, 2011

Joint modelling of voicing label and continuous F0 for HMM based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2011

On-line policy optimisation of spoken dialogue systems via live interaction with human subjects.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
Unsupervised training and directed manual transcription for LVCSR.
Speech Commun., 2010

The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management.
Comput. Speech Lang., 2010

From discontinuous to continuous F0 modelling in HMM-based speech synthesis.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Bayesian dialogue system for the Let's Go Spoken Dialogue Challenge.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Parameter learning for POMDP spoken dialogue models.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Parameter estimation for agenda-based user simulation.
Proceedings of the SIGDIAL 2010 Conference, 2010

Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers.
Proceedings of the SIGDIAL 2010 Conference, 2010

Context adaptive training with factorized decision trees for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2010, 2010

Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems.
Proceedings of the INTERSPEECH 2010, 2010

Canonical state models for automatic speech recognition.
Proceedings of the INTERSPEECH 2010, 2010

Word-level emphasis modelling in HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2010

Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning.
Proceedings of the ACL 2010, 2010

2009
Unsupervised Adaptation With Discriminative Mapping Transforms.
IEEE Trans. Speech Audio Process., 2009

k-Nearest Neighbor Monte-Carlo Control Algorithm for POMDP-Based Dialogue Systems.
Proceedings of the SIGDIAL 2009 Conference, 2009

Transformation-based learning for semantic parsing.
Proceedings of the INTERSPEECH 2009, 2009

Probablistic modelling of F0 in unvoiced regions in HMM based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2009

Spoken language understanding from unaligned data using discriminative classification models.
Proceedings of the IEEE International Conference on Acoustics, 2009

Back-off action selection in summary space-based POMDP dialogue systems.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
Modelling user behaviour in the HIS-POMDP dialogue manager.
Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Training and Evaluation of the HIS POMDP Dialogue System in Noise.
Proceedings of the SIGDIAL 2008 Workshop, 2008

Evaluating semantic-level confidence scores with multiple hypotheses.
Proceedings of the INTERSPEECH 2008, 2008

User study of the Bayesian update of dialogue state approach to dialogue management.
Proceedings of the INTERSPEECH 2008, 2008

Adaptive training using discriminative mapping transforms.
Proceedings of the INTERSPEECH 2008, 2008

Unsupervised discriminative adaptation using discriminative mapping transforms.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Bayesian Adaptive Inference and Adaptive Training.
IEEE Trans. Speech Audio Process., 2007

Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio.
Proceedings of the INTERSPEECH 2007, 2007

Improving Speech Transcription for Mandarin-English Translation.
Proceedings of the IEEE International Conference on Acoustics, 2007

Speech Recognition System Combination for Machine Translation.
Proceedings of the IEEE International Conference on Acoustics, 2007

Discriminative language model adaptation for Mandarin broadcast speech transcription and translation.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Development of a phonetic system for large vocabulary Arabic speech recognition.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
Discriminative cluster adaptive training.
IEEE Trans. Speech Audio Process., 2006

Incremental Adaptation using Bayesian Inference.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Investigation of Acoustic Modeling Techniques for LVCSR Systems.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Development of the CUHTK 2004 Mandarin Conversational Telephone Speech Transcription System.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Training LVCSR Systems on Thousands of Hours of Data.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Adaptive training using structured transforms.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004


  Loading...