Kaitao Song

Orcid: 0000-0002-4046-8594

According to our database1, Kaitao Song authored at least 42 papers between 2018 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Learning Domain Invariant Prompt for Vision-Language Models.
IEEE Trans. Image Process., 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
CoRR, 2024

EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model.
CoRR, 2024

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction.
CoRR, 2024

2023
TaskBench: Benchmarking Large Language Models for Task Automation.
CoRR, 2023

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models.
CoRR, 2023

Learning To Teach Large Language Models Logical Reasoning.
CoRR, 2023

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers.
CoRR, 2023

PromptTTS 2: Describing and Generating Voices with Text Prompt.
CoRR, 2023

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training.
CoRR, 2023

Deliberate then Generate: Enhanced Prompting Framework for Text Generation.
CoRR, 2023

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace.
CoRR, 2023

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CircuitNet: A Generic Neural Network to Realize Universal Circuit Motif Modeling.
Proceedings of the International Conference on Machine Learning, 2023

A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Towards Understanding Omission in Dialogue Summarization.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

DiffusionNER: Boundary Diffusion for Named Entity Recognition.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
PVT v2: Improved baselines with Pyramid Vision Transformer.
Comput. Vis. Media, 2022

Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One.
CoRR, 2022

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
Proceedings of the Interspeech 2022, 2022

Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech.
Proceedings of the Interspeech 2022, 2022

Analyzing and Mitigating Interference in Neural Architecture Search.
Proceedings of the International Conference on Machine Learning, 2022

A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Coarse-to-fine: A dual-view attention network for click-through rate prediction.
Knowl. Based Syst., 2021

PVTv2: Improved Baselines with Pyramid Vision Transformer.
CoRR, 2021

MPN: Multi-scale Progressive Restoration Network for Unsupervised Defect Detection.
Proceedings of the Pattern Recognition and Computer Vision - 4th Chinese Conference, 2021

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Bi-Modal Progressive Mask Attention for Fine-Grained Recognition.
IEEE Trans. Image Process., 2020

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning.
CoRR, 2020

MPNet: Masked and Permuted Pre-training for Language Understanding.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Neural Machine Translation with Error Correction.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

2019
MASS: Masked Sequence to Sequence Pre-training for Language Generation.
Proceedings of the 36th International Conference on Machine Learning, 2019

2018
Hybrid Self-Attention Network for Machine Translation.
CoRR, 2018

Generating Adversarial Examples With Conditional Generative Adversarial Net.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Double Path Networks for Sequence to Sequence Learning.
Proceedings of the 27th International Conference on Computational Linguistics, 2018


  Loading...