Nakamasa Inoue

CoRR, April, 2025

On the Relationship Between Double Descent of CNNs and Shape/Texture Bias Under Learning Process.

[BibT_eX]

[DOI]

CoRR, March, 2025

ContextualCoder: Adaptive In-Context Prompting for Programmatic Visual Question Answering.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

Diffusion-Based Generative Regularization for Supervised Discriminative Learning.

[BibT_eX]

[DOI]

Takuya Asakura

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Masked Gated Linear Unit.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Contracted Gram Tensor Distillation for Object Detection.

[BibT_eX]

[DOI]

Takumi Karasawa

Proceedings of the 7th ACM International Conference on Multimedia in Asia, 2025

STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models.

[BibT_eX]

[DOI]

Mahiro Ukai

Shuhei Kurita

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Collision Avoidance with Differentiable Occupancy Functions in Object Rearrangement.

[BibT_eX]

[DOI]

Roma Satoh

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

AnimalClue: Recognizing Animals by their Traces.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

AgroBench: Vision-Language Model Benchmark in Agriculture.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

CityNav: A Large-Scale Dataset for Real-World Aerial Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Referring Expression Comprehension for Small Objects.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Binary Stochastic Flip Optimization for Training Binary Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Multi-Point Positional Insertion Tuning for Small Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Rectified Lagrangian for Out-of-Distribution Detection in Modern Hopfield Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

HarmonicEval: Multi-modal, Multi-task, Multi-criteria Automatic Evaluation Using a Vision Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information.

[BibT_eX]

[DOI]

CoRR, 2024

AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Locally Aligned Rectified Flow Model for Speech Enhancement Towards Single-Step Diffusion.

[BibT_eX]

[DOI]

Zhengxiao Li

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

On the Relationship Between Double Descent of CNNs and Shape/Texture Bias Under Learning Process.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition - 27th International Conference, 2024

Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering.

[BibT_eX]

[DOI]

Ruoyue Shen

Proceedings of the IEEE International Conference on Image Processing, 2024

Spatiality-Aware Prompt Tuning for Few-Shot Small Object Detection.

[BibT_eX]

[DOI]

Takumi Karasawa

Proceedings of the IEEE International Conference on Image Processing, 2024

Pseudo-Outlier Synthesis Using Q-Gaussian Distributions for Out-of-Distribution Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

PolarDB: Formula-Driven Dataset for Pre-Training Trajectory Encoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Cubic Knowledge Distillation for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Formula-Supervised Visual-Geometric Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Rethinking Image Super-Resolution from Training Data Perspectives.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Scaling Backwards: Minimal Synthetic Pre-Training?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Augmenting Pass Prediction via Imitation Learning in Soccer Simulations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

A Simple Finetuning Strategy Based on Bias-Variance Ratios of Layer-Wise Gradients.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

Efficient Target Propagation by Deriving Analytical Solution.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Text-Guided Object Detector for Multi-modal Video Question Answering.

[BibT_eX]

[DOI]

Ruoyue Shen

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Scale-space Tokenization for Improving the Robustness of Vision Transformers.

[BibT_eX]

[DOI]

Lei Xu

Edgar Josafat Martinez-Noriega

Proceedings of the 31st ACM International Conference on Multimedia, 2023

SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Pre-training Vision Transformers with Very Limited Synthesized Images.

[BibT_eX]

[DOI]

Ryo Nakamura

Hirokatsu Kataoka

Sora Takashima

Rio Yokota

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Parameter Efficient Transfer Learning for Various Speech Processing Tasks.

[BibT_eX]

[DOI]

Shinta Otake

Proceedings of the IEEE International Conference on Acoustics, 2023

Step restriction for improving adversarial attacks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Visual Atoms: Pre-Training Vision Transformers with Sinusoidal Waves.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning with Partial Forgetting in Modern Hopfield Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

Fixed-Weight Difference Target Propagation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Spatiotemporal Initialization for 3D CNNs with Generated Motion Patterns.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

PoF: Post-Training of Feature Extractor for Improving Generalization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Downstream Augmentation Generation For Contrastive Learning.

[BibT_eX]

[DOI]

Tomohiro Hayase

Suguru Yasutomi

Edgar Josafat Martinez-Noriega

Proceedings of the IEEE International Conference on Acoustics, 2022

Replacing Labeled Real-image Datasets with Auto-generated Contours.

[BibT_eX]

[DOI]

Mariana Rodrigues Makiuchi

Rio Yokota

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Can Vision Transformers Learn without Natural Images?

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2021

Can Vision Transformers Learn without Natural Images?

[BibT_eX]

[DOI]

CoRR, 2021

Learning VAE with Categorical Labels for Generating Conditional Handwritten Characters.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Machine Vision and Applications, 2021

Disentangling Latent Groups Of Factors.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Formula-driven Supervised Learning with Recursive Tiling Patterns.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Teacher-Assisted Mini-Batch Sampling for Blind Distillation Using Metric Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Augmentation-Agnostic Regularization for Unsupervised Contrastive Learning with Its Application to Speaker Verification.

[BibT_eX]

[DOI]

Tsubasa Maruyama

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Graph Grouping Loss for Metric Learning of Face Image Representations.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing, 2020

Tokyo Tech at TRECVID 2020: Relation Modeling for Video Action Detection.

[BibT_eX]

[DOI]

Ronaldo Prata Amorim

Proceedings of the 2020 TREC Video Retrieval Evaluation, 2020

Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Initialization Using Perlin Noise for Training Networks with a Limited Amount of Data.

[BibT_eX]

[DOI]

Eisuke Yamagata

Hirokatsu Kataoka

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Deep Video Understanding of Character Relationships in Movies.

[BibT_eX]

[DOI]

Proceedings of the Companion Publication of the 2020 International Conference on Multimodal Interaction, 2020

Closed-Form Pre-Training for Small-Sample Environmental Sound Recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Optimizing Speaker Embeddings using Meta-Training Sets.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Quasi-Newton Adversarial Attacks on Speaker Verification Systems.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Pre-training Without Natural Images.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

2019

Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition.

[BibT_eX]

[DOI]

Raden Mu'az Mun'im

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

VANT at TRECVID 2018.

[BibT_eX]

[DOI]

Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Few-Shot Adaptation for Multimedia Semantic Indexing.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification.

[BibT_eX]

[DOI]

Jiacen Zhang

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data.

[BibT_eX]

[DOI]

Tifani Warnita

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multi-Task Autoencoder for Noise-Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition.

[BibT_eX]

[DOI]

Thao Le Minh

Proceedings of the British Machine Vision Conference 2018, 2018

2017

Cross-view human action recognition from depth maps using spectral graph sequences.

[BibT_eX]

[DOI]

Tommi Kerola

Comput. Vis. Image Underst., 2017

TokyoTech-AIST at TRECVID 2017: Multimedia Event Detection Using Deep CNNs and Zero-Shot Classiers.

[BibT_eX]

[DOI]

Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

CTC Network with Statistical Language Modeling for Action Sequence Recognition in Videos.

[BibT_eX]

[DOI]

Mengxi Lin

Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

User adaptation of convolutional neural network for human activity recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th European Signal Processing Conference, 2017

Multimodal speech recognition using mouth images from depth camera.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

A unified network for multi-speaker speech recognition with multi-channel recordings.

[BibT_eX]

[DOI]

Conggui Liu

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Fast Coding of Feature Vectors Using Neighbor-to-Neighbor Search.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2016

TokyoTech at TRECVID 2016.

[BibT_eX]

[DOI]

Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Adaptation of Word Vectors using Tree Structure for Visual Semantics.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Tokyo Tech at MediaEval 2016 Multimodal Person Discovery in Broadcast TV task.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

Graph regularized implicit pose for 3D human action recognition.

[BibT_eX]

[DOI]

Tommi Kerola

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015

TokyoTech at TRECVID 2015.

[BibT_eX]

[DOI]

Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Vocabulary Expansion Using Word Vectors for Video Semantic Indexing.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Combining Audio Features and Visual I-Vector @ MediaEval 2015 Multimodal Person Discovery in Broadcast TV.

[BibT_eX]

[DOI]

Fumito Nishi

Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

2014

TokyoTech-Waseda at TRECVID 2014.

[BibT_eX]

[DOI]

Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Event Detection by Velocity Pyramid.

[BibT_eX]

[DOI]

Zhuolin Liang

Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

n-gram Models for Video Semantic Indexing.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Spectral Graph Skeletons for 3D Action Recognition.

[BibT_eX]

[DOI]

Tommi Kerola

Proceedings of the Computer Vision - ACCV 2014, 2014

2013

Reusing Speech Techniques for Video Semantic Indexing [Applications Corner].

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2013

q-Gaussian mixture models for image and video semantic indexing.

[BibT_eX]

[DOI]

J. Vis. Commun. Image Represent., 2013

Event detection in consumer videos using GMM supervectors and SVMs.

[BibT_eX]

[DOI]

Yusuke Kamishima

EURASIP J. Image Video Process., 2013

TokyoTechCanon at TRECVID 2013.

[BibT_eX]

[DOI]

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2013

2012

A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2012

TokyoTechCanon at TRECVID 2012.

[BibT_eX]

[DOI]

Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Multimedia event detection using GMM supervectors and SVMS.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on Image Processing, 2012

q-Gaussian Mixture Models Based on Non-extensive Statistics for Image and Video Semantic Indexing.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2012

2011

TokyoTech+Canon at TRECVID 2011.

[BibT_eX]

[DOI]

Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011

A fast MAP adaptation technique for gmm-supervector-based video semantic indexing systems.

[BibT_eX]

[DOI]