Koichi Shinoda

Orcid: 0000-0003-1095-3203

According to our database1, Koichi Shinoda authored at least 144 papers between 1990 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
CAMOT: Camera Angle-aware Multi-Object Tracking.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Co-speech Gesture Generation with Variational Auto Encoder.
Proceedings of the MultiMedia Modeling - 30th International Conference, 2024

2023
Text-Guided Object Detector for Multi-modal Video Question Answering.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

EvIs-Kitchen: Egocentric Human Activities Recognition with Video and Inertial Sensor Data.
Proceedings of the MultiMedia Modeling - 29th International Conference, 2023

Synthesizing Speech from ECoG with a Combination of Transformer-Based Encoder and Neural Vocoder.
Proceedings of the IEEE International Conference on Acoustics, 2023

Sensor Data Representation with Transformer-Based Contrastive Learning for Human Action Recognition and Detection.
Proceedings of the 31st European Signal Processing Conference, 2023

Multimodal recognition of speech and electrocorticogram.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
MSR-DARTS: Minimum Stable Rank of Differentiable Architecture Search.
Proceedings of the International Joint Conference on Neural Networks, 2022

RI-DC: Rotation-Invariant Detection and Classification for Wheat Head Detection.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2022

Transformer-Based Estimation of Spoken Sentences Using Electrocorticography.
Proceedings of the IEEE International Conference on Acoustics, 2022

Implicit Neural Representations for Variable Length Human Motion Generation.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network.
IEICE Trans. Inf. Syst., 2021

Multimodal Emotion Recognition with High-Level Speech and Text Features.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Noise-Tolerant Time-Domain Speech Separation with Noise Bases.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition.
Comput. Speech Lang., 2020

Neural Architecture Search Using Stable Rank of Convolutional Layers.
CoRR, 2020

Tokyo Tech at TRECVID 2020: Relation Modeling for Video Action Detection.
Proceedings of the 2020 TREC Video Retrieval Evaluation, 2020

NEC-TT Speaker Verification System for SRE'19 CTS Challenge.
Proceedings of the Interspeech 2020, 2020

Estimation of Leaf Angle Distribution Based on Statistical Properties of Leaf Shading Distribution.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2020

Deep Video Understanding of Character Relationships in Movies.
Proceedings of the Companion Publication of the 2020 International Conference on Multimodal Interaction, 2020

2019
Recurrent out-of-vocabulary word detection based on distribution of features.
Comput. Speech Lang., 2019

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.
CoRR, 2019

Multimodal Fusion of BERT-CNN and Gated CNN Representations for Depression Detection.
Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, 2019

A Modified Algorithm for Multiple Input Spectrogram Inversion.
Proceedings of the Interspeech 2019, 2019

The NEC-TT 2018 Speaker Verification System.
Proceedings of the Interspeech 2019, 2019

Estimation of Diffuse Component of Global Radiation Based on Leaf-Scale Crop Images.
Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium, 2019

Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
VANT at TRECVID 2018.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Few-Shot Adaptation for Multimedia Semantic Indexing.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification.
Proceedings of the Interspeech 2018, 2018

Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data.
Proceedings of the Interspeech 2018, 2018

Attentive Statistics Pooling for Deep Speaker Embedding.
Proceedings of the Interspeech 2018, 2018

Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Multi-Task Autoencoder for Noise-Robust Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition.
Proceedings of the British Machine Vision Conference 2018, 2018

2017
Cross-view human action recognition from depth maps using spectral graph sequences.
Comput. Vis. Image Underst., 2017

TokyoTech-AIST at TRECVID 2017: Multimedia Event Detection Using Deep CNNs and Zero-Shot Classiers.
Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Boredom Recognition Based on Users' Spontaneous Behaviors in Multiparty Human-Robot Interactions.
Proceedings of the MultiMedia Modeling - 23rd International Conference, 2017

CTC Network with Statistical Language Modeling for Action Sequence Recognition in Videos.
Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

User adaptation of convolutional neural network for human activity recognition.
Proceedings of the 25th European Signal Processing Conference, 2017

Multimodal speech recognition using mouth images from depth camera.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

A unified network for multi-speaker speech recognition with multi-channel recordings.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Fast Coding of Feature Vectors Using Neighbor-to-Neighbor Search.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

Wise teachers train better DNN acoustic models.
EURASIP J. Audio Speech Music. Process., 2016

Robust discriminative training against data insufficiency in PLDA-based speaker verification.
Comput. Speech Lang., 2016

TokyoTech at TRECVID 2016.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Adaptation of Word Vectors using Tree Structure for Visual Semantics.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Tokyo Tech at MediaEval 2016 Multimodal Person Discovery in Broadcast TV task.
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features.
Proceedings of the Interspeech 2016, 2016

Graph regularized implicit pose for 3D human action recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Autonomous selection of i-vectors for PLDA modelling in speaker verification.
Speech Commun., 2015

Error Correction Using Long Context Match for Smartphone Speech Recognition.
IEICE Trans. Inf. Syst., 2015

TokyoTech at TRECVID 2015.
Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Vocabulary Expansion Using Word Vectors for Video Semantic Indexing.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Combining Audio Features and Visual I-Vector @ MediaEval 2015 Multimodal Person Discovery in Broadcast TV.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

2014
TokyoTech-Waseda at TRECVID 2014.
Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Speaker adaptation of deep neural networks using a hierarchy of output layers.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

An efficient error correction interface for speech recognition on mobile touchscreen devices.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

i-Vector Selection for Effective PLDA Modeling in Speaker Recognition.
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Discriminative PLDA training with application-specific loss functions for speaker verification.
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Event Detection by Velocity Pyramid.
Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

n-gram Models for Video Semantic Indexing.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Simple gesture-based error correction interface for smartphone speech recognition.
Proceedings of the INTERSPEECH 2014, 2014

Constrained discriminative PLDA training for speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2014

Semantics for Large-Scale Multimedia: New Challenges for NLP.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Spectral Graph Skeletons for 3D Action Recognition.
Proceedings of the Computer Vision - ACCV 2014, 2014

2013
Reusing Speech Techniques for Video Semantic Indexing [Applications Corner].
IEEE Signal Process. Mag., 2013

Detection of overlapped speech using lapel microphones in meeting.
Speech Commun., 2013

Feature normalization based on non-extensive statistics for speech recognition.
Speech Commun., 2013

q-Gaussian mixture models for image and video semantic indexing.
J. Vis. Commun. Image Represent., 2013

Spectral Subtraction Based on Non-extensive Statistics for Speech Recognition.
IEICE Trans. Inf. Syst., 2013

Event detection in consumer videos using GMM supervectors and SVMs.
EURASIP J. Image Video Process., 2013

A statistical approach for person verification using human behavioral patterns.
EURASIP J. Image Video Process., 2013

TokyoTechCanon at TRECVID 2013.
Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Combining deep speaker specific representations with GMM-SVM for speaker verification.
Proceedings of the INTERSPEECH 2013, 2013

Statistical Person Verification Using Behavioral Patterns from Complex Human Motion.
Proceedings of the New Trends in Image Analysis and Processing - ICIAP 2013, 2013

Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors.
Proceedings of the IEEE International Conference on Computer Vision, 2013

2012
A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors.
IEEE Trans. Multim., 2012

Active Learning Using Phone-Error Distribution for Speech Modeling.
IEICE Trans. Inf. Syst., 2012

Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model.
IEICE Trans. Inf. Syst., 2012

Robust Gait-Based Person Identification against Walking Speed Variations.
IEICE Trans. Inf. Syst., 2012

TokyoTechCanon at TRECVID 2012.
Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity.
Proceedings of the INTERSPEECH 2012, 2012

Q-Gaussian based spectral subtraction for robust speech recognition.
Proceedings of the INTERSPEECH 2012, 2012

Multimedia event detection using GMM supervectors and SVMS.
Proceedings of the 19th IEEE International Conference on Image Processing, 2012

Acoustic model training using committee-based active and semi-supervised learning for speech recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Efficient model training for HMM-based person identification by gait.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

q-Gaussian Mixture Models Based on Non-extensive Statistics for Image and Video Semantic Indexing.
Proceedings of the Computer Vision, 2012

2011
Semi-synchronous speech and pen input for mobile user interfaces.
Speech Commun., 2011

Committee-Based Active Learning for Speech Recognition.
IEICE Trans. Inf. Syst., 2011

TokyoTech+Canon at TRECVID 2011.
Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011

A fast MAP adaptation technique for gmm-supervector-based video semantic indexing systems.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Person authentication using 3D human motion.
Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding, 2011

Generalized-Log Spectral Mean Normalization for Speech Recognition.
Proceedings of the INTERSPEECH 2011, 2011

Structural Joint Factor Analysis for Speaker Recognition.
Proceedings of the INTERSPEECH 2011, 2011

Acoustic Forest for SMAP-Based Speaker Verification.
Proceedings of the INTERSPEECH 2011, 2011

Cross-Channel Spectral Subtraction for meeting speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2011

Structural MAP adaptation in GMM-supervector based speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2011

Designing text corpus using phone-error distribution for acoustic modeling.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
Acoustic Model Adaptation for Speech Recognition.
IEICE Trans. Inf. Syst., 2010

TT+GT at TRECVID 2010 Workshop.
Proceedings of the TRECVID 2010 workshop participants notebook papers, 2010

Dynamic language model adaptation using keyword category classification.
Proceedings of the INTERSPEECH 2010, 2010

High-Level Feature Extraction Using SIFT GMMs and Audio Models.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Robust Gait Recognition Against Speed Variation.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Speech modeling based on committee-based active learning.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
TITGT at TRECVID 2009 Workshop.
Proceedings of the TRECVID 2009 workshop participants notebook papers, 2009

Robust Speech Recognition in the Car Environment.
Proceedings of the Human Language Technology. Challenges for Computer Science and Linguistics, 2009

Speaker adaptation based on two-step active learning.
Proceedings of the INTERSPEECH 2009, 2009

Independent component analysis for noisy speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Tokyo Tech at TRECVID 2008.
Proceedings of the TRECVID 2008 workshop participants notebook papers, 2008

Automatically estimating number of scenes for rushes summarization.
Proceedings of the 2nd ACM Workshop on Video Summarization, 2008

Automatic Score Scene Detection for Baseball Video.
Proceedings of the Large-Scale Knowledge Resources. Construction and Application, 2008

Time-lag adaptation for semi-synchronous speech and pen input.
Proceedings of the INTERSPEECH 2008, 2008

Improvement of eigenvoice-based speaker adaptation by parameter space clustering.
Proceedings of the INTERSPEECH 2008, 2008

Robust spoken term detection using combination of phone-based and word-based recognition.
Proceedings of the INTERSPEECH 2008, 2008

2007
Robust Speech Recognition Using Factorial HMMs for Home Environments.
EURASIP J. Adv. Signal Process., 2007

TokyoTech's TRECVID2007 Notebook.
Proceedings of the TRECVID 2007 workshop participants notebook papers, 2007

Dynamic language model adaptation using presentation slides for lecture speech recognition.
Proceedings of the INTERSPEECH 2007, 2007

Automatic estimation of scaling factors among probabilistic models in speech recognition.
Proceedings of the INTERSPEECH 2007, 2007

Predictive minimum Bayes risk classification for robust speech recognition.
Proceedings of the INTERSPEECH 2007, 2007

Semi-Synchronous Speech and Pen Input.
Proceedings of the IEEE International Conference on Acoustics, 2007

Speech Recognition using FHMMS Robust Against Nonstationary Noise.
Proceedings of the IEEE International Conference on Acoustics, 2007

Home-environment adaptation of phoneme factorial hidden Markov models.
Proceedings of the 15th European Signal Processing Conference, 2007

A robust scene recognition system for baseball broadcast using data-driven approach.
Proceedings of the 6th ACM International Conference on Image and Video Retrieval, 2007

2006
Robust Scene Extraction Using Multi-Stream HMMs for Baseball Broadcast.
IEICE Trans. Inf. Syst., 2006

TokyoTech's TRECVID2006 Notebook.
Proceedings of the 2006 TREC Video Retrieval Evaluation, 2006

Robust scene recognition using language models for scene contexts.
Proceedings of the 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2006

Towards Optimal Bayes Decision for Speech Recognition.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Robust highlight extraction using multi-stream hidden Markov models for baseball video.
Proceedings of the 2005 International Conference on Image Processing, 2005

2002
Vocal tract length normalization using rapid maximum-likelihood estimation for speech recognition.
Syst. Comput. Jpn., 2002

Efficient reduction of Gaussian components using MDL criterion for HMM-based speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2002

2001
A structural Bayes approach to speaker adaptation.
IEEE Trans. Speech Audio Process., 2001

Rapid vocal tract length normalization using maximum likelihood estimation.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000
A family of Hadamard matrices of dihedral group type.
Discret. Appl. Math., 2000

1998
Unsupervised adaptation using structural Bayes approach.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997
Acoustic modeling based on the MDL principle for speech recognition.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

1996
Unsupervised and incremental speaker adaptation under adverse environmental conditions.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Speaker adaptation with autonomous model complexity control by MDL principle.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1995
Speaker adaptation with autonomous control using tree structure.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

High speed speech recognition using tree-structured probability density function.
Proceedings of the 1995 International Conference on Acoustics, 1995

1994
Speech recognition using tree-structured probability density function.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Unsupervised speaker adaptation for speech recognition using demi-syllable HMM.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

1991
Speaker adaptation for demi-syllable based continuous density HMM.
Proceedings of the 1991 International Conference on Acoustics, 1991

1990
Speaker adaptation for demi-syllable based speech recognition using continuous HMM.
Proceedings of the First International Conference on Spoken Language Processing, 1990


  Loading...