Kai Yu

Fei Wen

IEEE ACM Trans. Audio Speech Lang. Process., 2024

AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures.

[BibT_eX]

[DOI]

CoRR, 2024

Neural Directed Speech Enhancement with Dual Microphone Array in High Noise Scenario.

[BibT_eX]

[DOI]

CoRR, 2024

Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective.

[BibT_eX]

[DOI]

CoRR, 2024

Reducing Tool Hallucination via Reliability Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity.

[BibT_eX]

[DOI]

CoRR, 2024

MobA: A Two-Level Agent System for Efficient Mobile Task Automation.

[BibT_eX]

[DOI]

CoRR, 2024

SciDFM: A Large Language Model with Mixture-of-Experts for Science.

[BibT_eX]

[DOI]

CoRR, 2024

ChemDFM-X: Towards Large Multimodal Model for Chemistry.

[BibT_eX]

[DOI]

CoRR, 2024

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders.

[BibT_eX]

[DOI]

CoRR, 2024

FakeSound: Deepfake General Audio Detection.

[BibT_eX]

[DOI]

CoRR, 2024

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.

[BibT_eX]

[DOI]

CoRR, 2024

Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback.

[BibT_eX]

[DOI]

CoRR, 2024

Is Cognition and Action Consistent or Not: Investigating Large Language Model's Personality.

[BibT_eX]

[DOI]

CoRR, 2024

ChemDFM: Dialogue Foundation Model for Chemistry.

[BibT_eX]

[DOI]

CoRR, 2024

ChemDFM-X: towards large multimodal model for chemistry.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding.

[BibT_eX]

[DOI]

Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024

Semi-Supervised Learning For Code-Switching ASR With Large Language Model Filter.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Attention-Constrained Inference For Robust Decoder-Only Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Text-aware Speech Separation for Multi-talker Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

On the Effectiveness of Acoustic BPE in Decoder-Only TTS.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Evolving Subnetwork Training for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Semantic-Enhanced Supervised Contrastive Learning.

[BibT_eX]

[DOI]

Pingyue Zhang

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

A Birgat Model for Multi-Intent Spoken Language Understanding with Hierarchical Semantic Frames.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Audio Generation Diversity with Visual Information.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Acoustic BPE for Speech Generation with Discrete Tokens.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

DiffDub: Person-Generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-Encoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Label-Aware Auxiliary Learning for Dialogue State Tracking.

[BibT_eX]

[DOI]

Yuncong Liu

Proceedings of the IEEE International Conference on Acoustics, 2024

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

AlignSum: Data Pyramid Hierarchical Fine-tuning for Aligning with Human Summarization Preference.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks.

[BibT_eX]

[DOI]

Ruiyang Zhou

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Sparsity-Accelerated Training for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking.

[BibT_eX]

[DOI]

Wenbin Jiang

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2023

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations.

[BibT_eX]

[DOI]

CoRR, 2023

ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Audio Caption Fluency with Automatic Error Correction.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Large Language Model Is Semi-Parametric Reinforcement Learning Agent.

[BibT_eX]

[DOI]

CoRR, 2023

Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction.

[BibT_eX]

[DOI]

Danyang Zhang

CoRR, 2023

Large Language Models Are Semi-Parametric Reinforcement Learning Agents.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection.

[BibT_eX]

[DOI]

Pingyue Zhang

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Enhance Temporal Relations in Audio Captioning with Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

How ChatGPT is Robust for Spoken Language Understanding?

[BibT_eX]

[DOI]

Guangpeng Li

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

UnSE: Unsupervised Speech Enhancement Using Optimal Transport.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning.

[BibT_eX]

[DOI]

Xuenan Xu

Proceedings of the IEEE International Conference on Acoustics, 2023

DiffVoice: Text-to-Speech with Latent Diffusion.

[BibT_eX]

[DOI]

Zhijun Liu

Yiwei Guo

Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Diverse and Vivid Sound Generation from Text Descriptions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Exploring Schema Generalizability of Text-to-SQL.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

SPM: A Split-Parsing Method for Joint Multi-Intent Detection and Slot Filling.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), 2023

TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

2022

Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Neural Fusion for Voice Cloning.

[BibT_eX]

[DOI]

Bo Chen

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Data augmentation based non-parallel voice conversion with frame-level speaker disentangler.

[BibT_eX]

[DOI]

Bo Chen

Zhihang Xu

Speech Commun., 2022

BER: Balanced Error Rate For Speaker Diarization.

[BibT_eX]

[DOI]

Tao Liu

CoRR, 2022

DialogZoo: Large-Scale Dialog-Oriented Task Learning.

[BibT_eX]

[DOI]

CoRR, 2022

A Comprehensive Survey of Automated Audio Captioning.

[BibT_eX]

[DOI]

Xuenan Xu

CoRR, 2022

The AISP-SJTU Translation System for WMT 2022.

[BibT_eX]

[DOI]

Proceedings of the Seventh Conference on Machine Translation, 2022

UniDU: Towards A Unified Generative Dialogue Understanding Framework.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022

TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

The AISP-SJTU Simultaneous Translation System for IWSLT 2022.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Spoken Language Translation, 2022

The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

MSDWild: Multi-modal Speaker Diarization Dataset in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Efficient Speech Enhancement with Neural Homomorphic Synthesis.

[BibT_eX]

[DOI]

Wenbin Jiang

Tao Liu

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition.

[BibT_eX]

[DOI]

Xuenan Xu

Proceedings of the IEEE International Conference on Acoustics, 2022

Text Adaptive Detection for Customizable Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Climate and Weather: Inspecting Depression Detection via Emotion Recognition.

[BibT_eX]

[DOI]

Wen Wu

Proceedings of the IEEE International Conference on Acoustics, 2022

Audio-Text Retrieval in Context.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Navigating Audio-Visual Event Detection Across Mismatched Modalities.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Category-Adapted Sound Event Enhancement with Weakly Labeled Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Speech Enhancement with Neural Homomorphic Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis.

[BibT_eX]

[DOI]

Yiwei Guo

Proceedings of the IEEE International Conference on Acoustics, 2022

LatticeBART: Lattice-to-Lattice Pre-Training for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

AdapterShare: Task Correlation Modeling with Adapter Differentiation.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Towards Duration Robust Weakly Supervised Sound Event Detection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling.

[BibT_eX]

[DOI]

CoRR, 2021

Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2021

Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2021

Relation-Aware Multi-hop Reasoning forVisual Dialog.

[BibT_eX]

[DOI]

Yao Zhao

Proceedings of the Natural Language Processing and Chinese Computing, 2021

ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

DEPA: Self-Supervised Audio Embedding for Depression Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Audio Caption in a Car Setting with a Sentence-Level Loss.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

A Lightweight Framework for Online Voice Activity Detection in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Class-Based Neural Network Language Model for Second-Pass Rescoring in ASR.

[BibT_eX]

[DOI]

Lingfeng Dai

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction.

[BibT_eX]

[DOI]

Boer Lyu

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

WebSRC: A Dataset for Web-Based Structural Reading Comprehension.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations.

[BibT_eX]

[DOI]

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Dual Learning for Semi-Supervised Natural Language Understanding.

[BibT_eX]

[DOI]

Ruisheng Cao

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Neural Network Language Model Compression With Product Quantization and Soft Binarization.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Towards a new generation of artificial intelligence in China.

[BibT_eX]

[DOI]

Nat. Mach. Intell., 2020

CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking.

[BibT_eX]

[DOI]

CoRR, 2020

Dual Learning for Dialogue State Tracking.

[BibT_eX]

[DOI]

CoRR, 2020

Structured Hierarchical Dialogue Policy with Graph Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2020

Deep Reinforcement Learning for On-line Dialogue State Tracking.

[BibT_eX]

[DOI]

CoRR, 2020

End-to-End Speaker-Dependent Voice Activity Detection.

[BibT_eX]

[DOI]

CoRR, 2020

Vector Projection Network for Few-shot Slot Tagging in Natural Language Understanding.

[BibT_eX]

[DOI]

CoRR, 2020

An Investigation on Deep Learning with Beta Stabilizer.

[BibT_eX]

[DOI]

Tian Tan

CoRR, 2020

GPVAD: Towards noise robust voice activity detection via weakly supervised sound event detection.

[BibT_eX]

[DOI]

CoRR, 2020

An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2020

Memory Attention Neural Network for Multi-domain Dialogue State Tracking.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2020

Robust Spoken Language Understanding with RL-Based Value Error Recovery.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2020

Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Homomorphic Vocoder.

[BibT_eX]

[DOI]

Zhijun Liu

Kuan Chen

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs.

[BibT_eX]

[DOI]

Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Neural Lattice Search for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Addressing the Polysemy Problem in Language Modeling with Attentional Multi-Sense Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Hierarchical Tracker for Multi-Domain Dialogue State Tracking.

[BibT_eX]

[DOI]

Jieyu Li

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Augmentation for Low Resource Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Duration Robust Weakly Supervised Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Investigation of Specaugment for Deep Speaker Embedding Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning.

[BibT_eX]

[DOI]

Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Neural Graph Matching Networks for Chinese Short Text Matching.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

What does a Car-ssette tape tell?

[BibT_eX]

[DOI]

CoRR, 2019

AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Text-based Depression Detection: What Triggers An Alert.

[BibT_eX]

[DOI]

CoRR, 2019

Duration robust sound event detection.

[BibT_eX]

[DOI]

CoRR, 2019

Cross Aggregation of Multi-head Attention for Neural Machine Translation.

[BibT_eX]

[DOI]

Juncheng Cao

Hai Zhao

Proceedings of the Natural Language Processing and Chinese Computing, 2019

The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Joint Decoding of CTC Based Systems for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

CATSLU: The 1st Chinese Audio-Textual Spoken Language Understanding Challenge.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2019

Robust Spoken Language Understanding with Acoustic and Domain Knowledge.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2019

A Hierarchical Decoding Model for Spoken Language Understanding from Unaligned Data.

[BibT_eX]

[DOI]

Zijian Zhao

Proceedings of the IEEE International Conference on Acoustics, 2019

Audio Caption: Listen and Tell.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Knowledge Distillation for Small Foot-print Deep Speaker Embedding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Monaural Multi-speaker ASR System without Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Data Augmentation with Atomic Templates for Spoken Language Understanding.

[BibT_eX]

[DOI]

Zijian Zhao

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

SJTU Entry in Blizzard Challenge 2019.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

Highly Efficient Neural Network Language Model Compression Using Soft Binarization Training.

[BibT_eX]

[DOI]

Rao Ma

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Exploring Model Units and Training Strategies for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Semantic Parsing with Dual Learning.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018

Rich Short Text Conversation Using Semantic-Key-Controlled Sequence Generation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Sequence discriminative training for deep learning based acoustic keyword spotting.

[BibT_eX]

[DOI]

Zhehuai Chen

Speech Commun., 2018

Concept Transfer Learning for Adaptive Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Cost-Sensitive Active Learning for Dialogue State Tracking.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Binarized LSTM Language Model.

[BibT_eX]

[DOI]

Xuan Liu

Di Cao

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Joint Spoken Language Understanding and Domain Adaptive Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Intelligence Science and Big Data Engineering, 2018

Covariance Based Deep Feature for Text-Dependent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Intelligence Science and Big Data Engineering, 2018

Knowledge Distillation for Sequence Model.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Angular Softmax for Short-Duration Text-independent Speaker Verification.

[BibT_eX]

[DOI]

Zili Huang

Shuai Wang

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

MLN: Moment localization Network and Samples Selection for Moment Retrieval.

[BibT_eX]

[DOI]

Bo Huang

Ya Zhang

Proceedings of the 2nd International Conference on Video and Image Processing, 2018

Robust Spoken Language Understanding with Unsupervised ASR-Error Adaptation.

[BibT_eX]

[DOI]

Ouyu Lan

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker Identification.

[BibT_eX]

[DOI]

Shuai Wang

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Semi-Supervised Training Using Adversarial Multi-Task Learning for Spoken Language Understanding.

[BibT_eX]

[DOI]

Ouyu Lan

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

On Modular Training of Neural Acoustics-to-Word Model for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Policy Adaptation for Deep Reinforcement Learning-Based Dialogue Management.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Towards Universal Dialogue State Tracking.

[BibT_eX]

[DOI]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Structured Dialogue Policy with Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Computational Linguistics, 2018

2017

Phone Synchronous Speech Recognition With CTC Lattices.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

splab at the NTCIR-13 STC-2 Task.

[BibT_eX]

[DOI]

Proceedings of the 13th NTCIR Conference, 2017

A Unified Confidence Measure Framework Using Auxiliary Normalization Graph.

[BibT_eX]

[DOI]

Zhehuai Chen

Proceedings of the Intelligence Science and Big Data Engineering, 2017

Deep Attentive Structured Language Model Based on LSTM.

[BibT_eX]

[DOI]

Di Cao

Proceedings of the Intelligence Science and Big Data Engineering, 2017

Binary Deep Neural Networks for Speech Recognition.

[BibT_eX]

[DOI]

Xu Xiang

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

What Does the Speaker Embedding Encode?

[BibT_eX]

[DOI]

Shuai Wang

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Discrete Duration Model for Speech Synthesis.

[BibT_eX]

[DOI]

Bo Chen

Tianling Bian

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Small-footprint convolutional neural network for spoofing detection.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

End-to-end spoofing detection with raw waveform CLDNNS.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Confidence measures for CTC-based phone synchronous decoding.

[BibT_eX]

[DOI]

Zhehuai Chen

Yimeng Zhuang

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning.

[BibT_eX]

[DOI]

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Affordable On-line Dialogue Policy Learning.

[BibT_eX]

[DOI]

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

On-line Dialogue Policy Learning with Companion Teaching.

[BibT_eX]

[DOI]

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2017

Future vector enhanced LSTM language model for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

Cluster Adaptive Training for Deep Neural Network Based Acoustic Model.

[BibT_eX]

[DOI]

Tian Tan

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Deep features for automatic spoofing detection.

[BibT_eX]

[DOI]

Nanxin Chen

Speech Commun., 2016

Evolvable dialogue state tracking for statistical dialogue management.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2016

Recurrent Polynomial Network for Dialogue State Tracking.

[BibT_eX]

[DOI]

Kai Sun

Qizhe Xie

Dialogue Discourse, 2016

The splab at the NTCIR-12 Short Text Conversation Task.

[BibT_eX]

[DOI]

Ke Wu

Xuan Liu

Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, 2016

Multi-task joint-learning for robust voice activity detection.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Directed automatic speech transcription error correction using bidirectional LSTM.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Rich punctuations prediction using large-scale deep learning.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

On training bi-directional neural network language model with noise contrastive estimation.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Phone Synchronous Decoding with CTC Lattice.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Discriminatively trained joint speaker and environment representations for adaptation of deep neural network acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A comparative study of robustness of deep learning approaches for VAD.

[BibT_eX]

[DOI]

Sibo Tong

Hao Gu

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Constrained Markov Bayesian Polynomial for Efficient Dialogue State Tracking.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2015

Deep feature for text-dependent speaker verification.

[BibT_eX]

[DOI]

Speech Commun., 2015

Recurrent Polynomial Network for Dialogue State Tracking with Mismatched Semantic Parsers.

[BibT_eX]

[DOI]

Proceedings of the SIGDIAL 2015 Conference, 2015

Paragraph vector based topic model for language model adaptation.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multi-task learning for text-dependent speaker verification.

[BibT_eX]

[DOI]

Nanxin Chen

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

An investigation of context clustering for statistical speech synthesis with deep neural network.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Very deep convolutional neural networks for LVCSR.

[BibT_eX]

[DOI]

Mengxiao Bi

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Automatic model redundancy reduction for fast back-propagation for deep neural networks in speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Joint Conference on Neural Networks, 2015

Cluster adaptive training for deep neural network.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Recurrent neural network language model with structured word embeddings for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A novel static parameter calculation method for model compensation.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Local trajectory based speech enhancement for robust speech recognition with deep neural network.

[BibT_eX]

[DOI]

Yongbin You

Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

An investigation on DNN-derived bottleneck features for GMM-HMM based robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Multi-task joint-learning of deep neural networks for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Semantic parser enhancement for dialogue domain extension with little data.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

A generalized rule based tracker for dialogue state tracking.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

The SJTU System for Dialog State Tracking Challenge 2.

[BibT_eX]

[DOI]

Proceedings of the SIGDIAL 2014 Conference, 2014

Acoustic emotion recognition using deep neural network.

[BibT_eX]

[DOI]

Jianwei Niu

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Tandem deep features for text-dependent speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A novel dynamic parameters calculation approach for model compensation.

[BibT_eX]

[DOI]

Suliang Bu

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Speaker verification with deep features.

[BibT_eX]

[DOI]

Proceedings of the 2014 International Joint Conference on Neural Networks, 2014

Reshaping deep neural network for fast decoding by node-pruning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Stochastic data sweeping for fast DNN training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Second order vector taylor series based robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

A New Word Language Model Evaluation Metric for Character Based Languages.

[BibT_eX]

[DOI]

Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 2013

Combination of data borrowing strategies for low-resource LVCSR.

[BibT_eX]

[DOI]

Jia Liu

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Introduction to the Issue on Advances in Spoken Dialogue Systems and Mobile Interface.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2012

Discriminative spoken language understanding using word confusion networks.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

The Effect of Cognitive Load on a Statistical Dialogue System.

[BibT_eX]

[DOI]

Proceedings of the SIGDIAL 2012 Conference, 2012

Development of the 2012 SJTU HVR system.

[BibT_eX]

[DOI]

Hainan Xu

Yuchen Fan

Proceedings of the International Conference on Multimodal Interaction, 2012

ICMI'12 grand challenge: haptic voice recognition.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2012

2011

Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2011

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Speech Commun., 2011

Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results.

[BibT_eX]

[DOI]

Proceedings of the SIGDIAL 2011 Conference, 2011

Real User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Joint modelling of voicing label and continuous F0 for HMM based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

On-line policy optimisation of spoken dialogue systems via live interaction with human subjects.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Unsupervised training and directed manual transcription for LVCSR.

[BibT_eX]

[DOI]

Speech Commun., 2010

The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2010

From discontinuous to continuous F0 modelling in HMM-based speech synthesis.

[BibT_eX]

[DOI]

Blaise Thomson

Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Bayesian dialogue system for the Let's Go Spoken Dialogue Challenge.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Parameter learning for POMDP spoken dialogue models.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Parameter estimation for agenda-based user simulation.

[BibT_eX]

[DOI]

Proceedings of the SIGDIAL 2010 Conference, 2010

Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers.

[BibT_eX]

[DOI]

Proceedings of the SIGDIAL 2010 Conference, 2010

Context adaptive training with factorized decision trees for HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Canonical state models for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Word-level emphasis modelling in HMM-based speech synthesis.

[BibT_eX]

[DOI]

François Mairesse

Proceedings of the IEEE International Conference on Acoustics, 2010

Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning.

[BibT_eX]

[DOI]

Proceedings of the ACL 2010, 2010

2009

Unsupervised Adaptation With Discriminative Mapping Transforms.

[BibT_eX]

[DOI]

Philip C. Woodland

IEEE Trans. Speech Audio Process., 2009

k-Nearest Neighbor Monte-Carlo Control Algorithm for POMDP-Based Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the SIGDIAL 2009 Conference, 2009

Transformation-based learning for semantic parsing.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Probablistic modelling of F0 in unvoiced regions in HMM based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Spoken language understanding from unaligned data using discriminative classification models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Back-off action selection in summary space-based POMDP dialogue systems.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

Modelling user behaviour in the HIS-POMDP dialogue manager.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Training and Evaluation of the HIS POMDP Dialogue System in Noise.

[BibT_eX]

[DOI]

Proceedings of the SIGDIAL 2008 Workshop, 2008

Evaluating semantic-level confidence scores with multiple hypotheses.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

User study of the Bayesian update of dialogue state approach to dialogue management.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Adaptive training using discriminative mapping transforms.

[BibT_eX]

[DOI]

Chandra Kant Raut

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Unsupervised discriminative adaptation using discriminative mapping transforms.

[BibT_eX]

[DOI]

Philip C. Woodland

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Bayesian Adaptive Inference and Adaptive Training.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2007

Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio.

[BibT_eX]

[DOI]

Philip C. Woodland

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Improving Speech Transcription for Mandarin-English Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Speech Recognition System Combination for Machine Translation.

[BibT_eX]

[DOI]

Abdelkhalek Messaoudi

Proceedings of the IEEE International Conference on Acoustics, 2007

Discriminative language model adaptation for Mandarin broadcast speech transcription and translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Development of a phonetic system for large vocabulary Arabic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

Discriminative cluster adaptive training.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2006

Incremental Adaptation using Bayesian Inference.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Investigation of Acoustic Modeling Techniques for LVCSR Systems.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Development of the CUHTK 2004 Mandarin Conversational Telephone Speech Transcription System.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Training LVCSR Systems on Thousands of Hours of Data.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Adaptive training using structured transforms.

[BibT_eX]

[DOI]