Naoki Makishima

Orcid: 0000-0002-7065-315X

According to our database1, Naoki Makishima authored at least 28 papers between 2019 and 2023.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of five.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
End-to-End Joint Target and Non-Target Speakers ASR.
CoRR, 2023

Multi-region CNN-Transformer for Micro-gesture Recognition in Face and Upper Body.
Proceedings of the ACM Multimedia Asia 2023, 2023

OnDA-DETR: Online Domain Adaptation for Detection Transformers with Self-Training Framework.
Proceedings of the IEEE International Conference on Image Processing, 2023

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Text-to-Text Pre-Training with Paraphrasing for Improving Transformer-Based Image Captioning.
Proceedings of the 31st European Signal Processing Conference, 2023

2022
Knowledge Transferred Fine-Tuning: Convolutional Neural Network Is Born Again With Anti-Aliasing Even in Data-Limited Situations.
IEEE Access, 2022

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
Proceedings of the Interspeech 2022, 2022

End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Proceedings of the Interspeech 2022, 2022

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
Proceedings of the Interspeech 2022, 2022

Customer Satisfaction Estimation Using Unsupervised Representation Learning with Multi-Format Prediction Loss.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Independent deeply learned matrix analysis with automatic selection of stable microphone-wise update and fast sourcewise update of demixing matrix.
Signal Process., 2021

Large-Context Conversational Representation Learning: Self-Supervised Learning For Conversational Documents.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages.
Proceedings of the MMAsia '21: ACM Multimedia Asia, Gold Coast, Australia, December 1, 2021

End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Enrollment-Less Training for Personalized Voice Activity Detection.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss.
Proceedings of the IEEE International Conference on Acoustics, 2021

MAPGN: Masked Pointer-Generator Network for Sequence-to-Sequence Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2021

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition.
Proceedings of the Interspeech 2020, 2020

Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model.
Proceedings of the 13th International Conference on Natural Language Generation, 2020

Unsupervised Domain Adversarial Training in Angular Space for Facial Expression Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Robust Demixing Filter Update Algorithm Based on Microphone-wise Coordinate Descent for Independent Deeply Learned Matrix Analysis.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019


  Loading...