Koichi Shinoda

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Toward Designing a Reduced Phone Set Using Text Decoding Accuracy Estimates in Speech BCI.

[BibT_eX]

[DOI]

Shuji Komeiji

Toshihisa Tanaka

Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies, 2025

2024

EvIs-Kitchen.

[BibT_eX]

[DOI]

Yuzhe Hao

Dataset, July, 2024

CAMOT: Camera Angle-aware Multi-Object Tracking.

[BibT_eX]

[DOI]

Felix Limanta

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Co-speech Gesture Generation with Variational Auto Encoder.

[BibT_eX]

[DOI]

Shinichi Ka

Proceedings of the MultiMedia Modeling - 30th International Conference, 2024

Domain-Specific Adaptation for Enhanced Gait Recognition in Practical Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 2024 6th International Conference on Image, Video and Signal Processing, 2024

MSDET: Multitask Speaker Separation and Direction-of-Arrival Estimation Training.

[BibT_eX]

[DOI]

Roland Hartanto

Sakriani Sakti

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering.

[BibT_eX]

[DOI]

Ruoyue Shen

Proceedings of the IEEE International Conference on Image Processing, 2024

LDMSE: Low Computational Cost Generative Diffusion Model for Speech Enhancement.

[BibT_eX]

[DOI]

Yuki Nishi

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023

Text-Guided Object Detector for Multi-modal Video Question Answering.

[BibT_eX]

[DOI]

Ruoyue Shen

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

EvIs-Kitchen: Egocentric Human Activities Recognition with Video and Inertial Sensor Data.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 29th International Conference, 2023

Synthesizing Speech from ECoG with a Combination of Transformer-Based Encoder and Neural Vocoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Sensor Data Representation with Transformer-Based Contrastive Learning for Human Action Recognition and Detection.

[BibT_eX]

[DOI]

Lei Yang

Yuzhe Hao

Proceedings of the 31st European Signal Processing Conference, 2023

Multimodal recognition of speech and electrocorticogram.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

MSR-DARTS: Minimum Stable Rank of Differentiable Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2022

RI-DC: Rotation-Invariant Detection and Classification for Wheat Head Detection.

[BibT_eX]

[DOI]

Takeru Ito

Mariana Rodrigues Makiuchi

Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2022

Transformer-Based Estimation of Spoken Sentences Using Electrocorticography.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Implicit Neural Representations for Variable Length Human Motion Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2021

Multimodal Emotion Recognition with High-Level Speech and Text Features.

[BibT_eX]

[DOI]

Mariana Rodrigues Makiuchi

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Noise-Tolerant Time-Domain Speech Separation with Noise Bases.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2020

Neural Architecture Search Using Stable Rank of Convolutional Layers.

[BibT_eX]

[DOI]

CoRR, 2020

Tokyo Tech at TRECVID 2020: Relation Modeling for Video Action Detection.

[BibT_eX]

[DOI]

Ronaldo Prata Amorim

Mariana Rodrigues Makiuchi

Proceedings of the 2020 TREC Video Retrieval Evaluation, 2020

NEC-TT Speaker Verification System for SRE'19 CTS Challenge.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Estimation of Leaf Angle Distribution Based on Statistical Properties of Leaf Shading Distribution.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2020

Deep Video Understanding of Character Relationships in Movies.

[BibT_eX]

[DOI]

Proceedings of the Companion Publication of the 2020 International Conference on Multimodal Interaction, 2020

2019

Recurrent out-of-vocabulary word detection based on distribution of features.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2019

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.

[BibT_eX]

[DOI]

CoRR, 2019

Multimodal Fusion of BERT-CNN and Gated CNN Representations for Depression Detection.

[BibT_eX]

[DOI]

Tifani Warnita

Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, 2019

A Modified Algorithm for Multiple Input Spectrogram Inversion.

[BibT_eX]

[DOI]

Dongxiao Wang

Hirokazu Kameoka

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The NEC-TT 2018 Speaker Verification System.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Estimation of Diffuse Component of Global Radiation Based on Leaf-Scale Crop Images.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium, 2019

Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition.

[BibT_eX]

[DOI]

Raden Mu'az Mun'im

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

VANT at TRECVID 2018.

[BibT_eX]

[DOI]

Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Few-Shot Adaptation for Multimedia Semantic Indexing.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification.

[BibT_eX]

[DOI]

Jiacen Zhang

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data.

[BibT_eX]

[DOI]

Tifani Warnita

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Attentive Statistics Pooling for Deep Speaker Embedding.

[BibT_eX]

[DOI]

Koji Okabe

Takafumi Koshinaka

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Multi-Task Autoencoder for Noise-Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition.

[BibT_eX]

[DOI]

Thao Le Minh

Proceedings of the British Machine Vision Conference 2018, 2018

2017

Cross-view human action recognition from depth maps using spectral graph sequences.

[BibT_eX]

[DOI]

Tommi Kerola

Comput. Vis. Image Underst., 2017

TokyoTech-AIST at TRECVID 2017: Multimedia Event Detection Using Deep CNNs and Zero-Shot Classiers.

[BibT_eX]

[DOI]

Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Boredom Recognition Based on Users' Spontaneous Behaviors in Multiparty Human-Robot Interactions.

[BibT_eX]

[DOI]

Yasuhiro Shibasaki

Kotaro Funakoshi

Proceedings of the MultiMedia Modeling - 23rd International Conference, 2017

CTC Network with Statistical Language Modeling for Action Sequence Recognition in Videos.

[BibT_eX]

[DOI]

Mengxi Lin

Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

User adaptation of convolutional neural network for human activity recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th European Signal Processing Conference, 2017

Multimodal speech recognition using mouth images from depth camera.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

A unified network for multi-speaker speech recognition with multi-channel recordings.

[BibT_eX]

[DOI]

Conggui Liu

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Fast Coding of Feature Vectors Using Neighbor-to-Neighbor Search.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2016

Wise teachers train better DNN acoustic models.

[BibT_eX]

[DOI]

Ryan Price

EURASIP J. Audio Speech Music. Process., 2016

Robust discriminative training against data insufficiency in PLDA-based speaker verification.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2016

TokyoTech at TRECVID 2016.

[BibT_eX]

[DOI]

Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Adaptation of Word Vectors using Tree Structure for Visual Semantics.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Tokyo Tech at MediaEval 2016 Multimodal Person Discovery in Broadcast TV task.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Graph regularized implicit pose for 3D human action recognition.

[BibT_eX]

[DOI]

Tommi Kerola

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015

Autonomous selection of i-vectors for PLDA modelling in speaker verification.

[BibT_eX]

[DOI]

Speech Commun., 2015

Error Correction Using Long Context Match for Smartphone Speech Recognition.

[BibT_eX]

[DOI]

Yuan Liang

IEICE Trans. Inf. Syst., 2015

TokyoTech at TRECVID 2015.

[BibT_eX]

[DOI]

Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Vocabulary Expansion Using Word Vectors for Video Semantic Indexing.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Combining Audio Features and Visual I-Vector @ MediaEval 2015 Multimodal Person Discovery in Broadcast TV.

[BibT_eX]

[DOI]

Fumito Nishi

Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

2014

TokyoTech-Waseda at TRECVID 2014.

[BibT_eX]

[DOI]

Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Speaker adaptation of deep neural networks using a hierarchy of output layers.

[BibT_eX]

[DOI]

Ryan Price

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

An efficient error correction interface for speech recognition on mobile touchscreen devices.

[BibT_eX]

[DOI]

Yuan Liang

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

i-Vector Selection for Effective PLDA Modeling in Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Discriminative PLDA training with application-specific loss functions for speaker verification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Event Detection by Velocity Pyramid.

[BibT_eX]

[DOI]

Zhuolin Liang

Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

n-gram Models for Video Semantic Indexing.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Simple gesture-based error correction interface for smartphone speech recognition.

[BibT_eX]

[DOI]

Yuan Liang

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Constrained discriminative PLDA training for speaker verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Semantics for Large-Scale Multimedia: New Challenges for NLP.

[BibT_eX]

[DOI]

Florian Metze

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Spectral Graph Skeletons for 3D Action Recognition.

[BibT_eX]

[DOI]

Tommi Kerola

Proceedings of the Computer Vision - ACCV 2014, 2014

2013

Reusing Speech Techniques for Video Semantic Indexing [Applications Corner].

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2013

Detection of overlapped speech using lapel microphones in meeting.

[BibT_eX]

[DOI]

Speech Commun., 2013

Feature normalization based on non-extensive statistics for speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2013

q-Gaussian mixture models for image and video semantic indexing.

[BibT_eX]

[DOI]

J. Vis. Commun. Image Represent., 2013

Spectral Subtraction Based on Non-extensive Statistics for Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2013

Event detection in consumer videos using GMM supervectors and SVMs.

[BibT_eX]

[DOI]

Yusuke Kamishima

EURASIP J. Image Video Process., 2013

A statistical approach for person verification using human behavioral patterns.

[BibT_eX]

[DOI]

Felipe Gómez-Caballero

Takahiro Shinozaki

EURASIP J. Image Video Process., 2013

TokyoTechCanon at TRECVID 2013.

[BibT_eX]

[DOI]

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Combining deep speaker specific representations with GMM-SVM for speaker verification.

[BibT_eX]

[DOI]

Ryan Price

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Statistical Person Verification Using Behavioral Patterns from Complex Human Motion.

[BibT_eX]

[DOI]

Felipe Gómez-Caballero

Takahiro Shinozaki

Proceedings of the New Trends in Image Analysis and Processing - ICIAP 2013, 2013

Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2013

2012

A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2012

Active Learning Using Phone-Error Distribution for Speech Modeling.

[BibT_eX]

[DOI]

Hiroko Murakami

IEICE Trans. Inf. Syst., 2012

Robust Gait-Based Person Identification against Walking Speed Variations.

[BibT_eX]

[DOI]

Muhammad Rasyid Aqmar

IEICE Trans. Inf. Syst., 2012

TokyoTechCanon at TRECVID 2012.

[BibT_eX]

[DOI]

Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Q-Gaussian based spectral subtraction for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Multimedia event detection using GMM supervectors and SVMS.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on Image Processing, 2012

Acoustic model training using committee-based active and semi-supervised learning for speech recognition.

[BibT_eX]

[DOI]

Takuya Tsutaoka

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Efficient model training for HMM-based person identification by gait.

[BibT_eX]

[DOI]

Muhammad Rasyid Aqmar

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

q-Gaussian Mixture Models Based on Non-extensive Statistics for Image and Video Semantic Indexing.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2012

2011

Semi-synchronous speech and pen input for mobile user interfaces.

[BibT_eX]

[DOI]

Speech Commun., 2011

Committee-Based Active Learning for Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2011

TokyoTech+Canon at TRECVID 2011.

[BibT_eX]

[DOI]

Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011

A fast MAP adaptation technique for gmm-supervector-based video semantic indexing systems.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Person authentication using 3D human motion.

[BibT_eX]

[DOI]

Felipe Gómez-Caballero

Takahiro Shinozaki

Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding, 2011

Generalized-Log Spectral Mean Normalization for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Structural Joint Factor Analysis for Speaker Recognition.

[BibT_eX]

[DOI]

Marc Ferras

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Acoustic Forest for SMAP-Based Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Cross-Channel Spectral Subtraction for meeting speech recognition.

[BibT_eX]

[DOI]

Yu Nasu

Proceedings of the IEEE International Conference on Acoustics, 2011

Structural MAP adaptation in GMM-supervector based speaker recognition.

[BibT_eX]

[DOI]

Marc Ferras

Proceedings of the IEEE International Conference on Acoustics, 2011

Designing text corpus using phone-error distribution for acoustic modeling.

[BibT_eX]

[DOI]

Hiroko Murakami

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Acoustic Model Adaptation for Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2010

TT+GT at TRECVID 2010 Workshop.

[BibT_eX]

[DOI]

Proceedings of the TRECVID 2010 workshop participants notebook papers, 2010

Dynamic language model adaptation using keyword category classification.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

High-Level Feature Extraction Using SIFT GMMs and Audio Models.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Pattern Recognition, 2010

Robust Gait Recognition Against Speed Variation.

[BibT_eX]

[DOI]

Muhammad Rasyid Aqmar

Agnieszka Betkowska Cavalcante

Proceedings of the 20th International Conference on Pattern Recognition, 2010

Speech modeling based on committee-based active learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

TITGT at TRECVID 2009 Workshop.

[BibT_eX]

[DOI]

Proceedings of the TRECVID 2009 workshop participants notebook papers, 2009

Robust Speech Recognition in the Car Environment.

[BibT_eX]

[DOI]

Proceedings of the Human Language Technology. Challenges for Computer Science and Linguistics, 2009

Speaker adaptation based on two-step active learning.

[BibT_eX]

[DOI]

Hiroko Murakami

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Online speaker clustering using incremental learning of an ergodic hidden Markov model.

[BibT_eX]

[DOI]

Takafumi Koshinaka

Kentaro Nagatomo

Proceedings of the IEEE International Conference on Acoustics, 2009

Independent component analysis for noisy speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

Tokyo Tech at TRECVID 2008.

[BibT_eX]

[DOI]

Proceedings of the TRECVID 2008 workshop participants notebook papers, 2008

Automatically estimating number of scenes for rushes summarization.

[BibT_eX]

[DOI]

Koji Yamasaki

Proceedings of the 2nd ACM Workshop on Video Summarization, 2008

Automatic Score Scene Detection for Baseball Video.

[BibT_eX]

[DOI]

Proceedings of the Large-Scale Knowledge Resources. Construction and Application, 2008

Time-lag adaptation for semi-synchronous speech and pen input.

[BibT_eX]

[DOI]

Yasushi Watanabe

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Improvement of eigenvoice-based speaker adaptation by parameter space clustering.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Robust spoken term detection using combination of phone-based and word-based recognition.

[BibT_eX]

[DOI]

Kenji Iwata

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007

Robust Speech Recognition Using Factorial HMMs for Home Environments.

[BibT_eX]

[DOI]

Agnieszka Betkowska

EURASIP J. Adv. Signal Process., 2007

TokyoTech's TRECVID2007 Notebook.

[BibT_eX]

[DOI]

Taichi Nakamura

Proceedings of the TRECVID 2007 workshop participants notebook papers, 2007

Dynamic language model adaptation using presentation slides for lecture speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Automatic estimation of scaling factors among probabilistic models in speech recognition.

[BibT_eX]

[DOI]

Tadashi Emori

Yoshifumi Onishi

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Predictive minimum Bayes risk classification for robust speech recognition.

[BibT_eX]

[DOI]

Jen-Tzung Chien

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Semi-Synchronous Speech and Pen Input.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Speech Recognition using FHMMS Robust Against Nonstationary Noise.

[BibT_eX]

[DOI]

Agnieszka Betkowska

Proceedings of the IEEE International Conference on Acoustics, 2007

Home-environment adaptation of phoneme factorial hidden Markov models.

[BibT_eX]

[DOI]

Agnieszka Betkowska

Proceedings of the 15th European Signal Processing Conference, 2007

A robust scene recognition system for baseball broadcast using data-driven approach.

[BibT_eX]

[DOI]

Proceedings of the 6th ACM International Conference on Image and Video Retrieval, 2007

2006

Robust Scene Extraction Using Multi-Stream HMMs for Baseball Broadcast.

[BibT_eX]

[DOI]

Nguyen Huu Bach

IEICE Trans. Inf. Syst., 2006

TokyoTech's TRECVID2006 Notebook.

[BibT_eX]

[DOI]

Proceedings of the 2006 TREC Video Retrieval Evaluation, 2006

Robust scene recognition using language models for scene contexts.

[BibT_eX]

[DOI]

Proceedings of the 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2006

Towards Optimal Bayes Decision for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Robust highlight extraction using multi-stream hidden Markov models for baseball video.

[BibT_eX]

[DOI]

Nguyen Huu Bach

Proceedings of the 2005 International Conference on Image Processing, 2005

2002

Vocal tract length normalization using rapid maximum-likelihood estimation for speech recognition.

[BibT_eX]

[DOI]

Tadashi Emori

Syst. Comput. Jpn., 2002

Efficient reduction of Gaussian components using MDL criterion for HMM-based speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2002

2001

A structural Bayes approach to speaker adaptation.

[BibT_eX]

[DOI]

Chin-Hui Lee

IEEE Trans. Speech Audio Process., 2001

Rapid vocal tract length normalization using maximum likelihood estimation.

[BibT_eX]

[DOI]

Tadashi Emori

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000

A family of Hadamard matrices of dihedral group type.

[BibT_eX]

[DOI]

Mieko Yamada

Discret. Appl. Math., 2000

1998

Unsupervised adaptation using structural Bayes approach.

[BibT_eX]

[DOI]

Chin-Hui Lee

Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997

Acoustic modeling based on the MDL principle for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

1996

Unsupervised and incremental speaker adaptation under adverse environmental conditions.

[BibT_eX]

[DOI]

Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Speaker adaptation with autonomous model complexity control by MDL principle.

[BibT_eX]

[DOI]

Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1995

Speaker adaptation with autonomous control using tree structure.

[BibT_eX]

[DOI]

Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

High speed speech recognition using tree-structured probability density function.

[BibT_eX]

[DOI]

Proceedings of the 1995 International Conference on Acoustics, 1995

1994

Speech recognition using tree-structured probability density function.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Unsupervised speaker adaptation for speech recognition using demi-syllable HMM.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

1991

Speaker adaptation for demi-syllable based continuous density HMM.

[BibT_eX]

[DOI]

Proceedings of the 1991 International Conference on Acoustics, 1991

1990

Speaker adaptation for demi-syllable based speech recognition using continuous HMM.

[BibT_eX]

[DOI]