We stand with Ukraine

We stand with Ukraine

Kazuhito Koishida

Orcid: 0000-0002-3111-5375

According to our database¹, Kazuhito Koishida authored at least 59 papers between 1994 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

CUA-Skill: Develop Skills for Computer Using Agent.

[DOI]

,

,

Michael Solodko

,

,

,

,

,

,

,

,

,

,

Pashmina Cameron

,

,

Kazuhito Koishida

CoRR, January, 2026

Do GUI Grounders Truly Understand UI Elements?

[DOI]

,

,

,

Kazuhito Koishida

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

2025

AppSelectBench: Application-Level Tool Selection Benchmark.

[DOI]

,

Michael Solodko

,

,

,

,

Colby R. Banbury

,

,

,

,

,

,

Kamran Ghasedi Dizaji

,

,

,

,

Pashmina Cameron

,

Kazuhito Koishida

CoRR, November, 2025

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems.

[DOI]

,

,

,

Kazuhito Koishida

CoRR, September, 2025

Instruction Agent: Enhancing Agent with Expert Demonstration.

[DOI]

,

Hailey Hultquist

,

,

Kazuhito Koishida

CoRR, September, 2025

WinClick: GUI Grounding with Multimodal Large Language Models.

[DOI]

,

,

,

,

Colby R. Banbury

,

Kazuhito Koishida

CoRR, March, 2025

Self-reflecting Large Language Models: A Hegelian Dialectical Approach.

[DOI]

,

,

,

Kazuhito Koishida

CoRR, January, 2025

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale.

[DOI]

Rogerio Bonatti

,

,

Francesco Bonacci

,

,

,

,

,

,

Kazuhito Koishida

,

,

Lawrence Keunho Jang

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks.

[DOI]

Lawrence Keunho Jang

,

,

,

,

,

,

Rogerio Bonatti

,

Kazuhito Koishida

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CorrGAN: Simultaneous Learning of Speech Enhancement and Perceptual Quality Loss Functions.

[DOI]

Vasily Zadorozhnyy

,

,

,

Kazuhito Koishida

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression.

[DOI]

,

,

Colby R. Banbury

,

Daniel P. Robinson

,

,

Kazuhito Koishida

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models.

[DOI]

,

,

,

Colby R. Banbury

,

,

Kazuhito Koishida

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

2024

ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation.

[DOI]

,

,

,

Kazuhito Koishida

,

Somayeh Sojoudi

Dataset, July, 2024

Automatic Disfluency Detection From Untranscribed Speech.

[DOI]

,

Kazuhito Koishida

,

Emily Mower Provost

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Zero-Shot Text-to-Speech from Continuous Text Streams.

[DOI]

,

,

,

,

Kazuhito Koishida

CoRR, 2024

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale.

[DOI]

Rogerio Bonatti

,

,

Francesco Bonacci

,

,

,

,

,

,

Kazuhito Koishida

,

,

,

CoRR, 2024

Data Generation Using Large Language Models for Text Classification: An Empirical Case Study.

[DOI]

,

Rogerio Bonatti

,

,

,

Kazuhito Koishida

CoRR, 2024

ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation.

[DOI]

,

,

,

Kazuhito Koishida

,

Somayeh Sojoudi

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes.

[DOI]

,

,

,

Kazuhito Koishida

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Weakly-supervised Audio Separation via Bi-modal Semantic Similarity.

[DOI]

,

,

Kazuhito Koishida

,

Diana Marculescu

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Learned Image Compression With Text Quality Enhancement.

[DOI]

,

,

Kazuhito Koishida

Proceedings of the IEEE International Conference on Image Processing, 2024

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures.

[DOI]

Afrina Tabassum

,

,

,

Ismini Lourentzou

,

Kazuhito Koishida

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation.

[DOI]

,

,

,

Kazuhito Koishida

,

Somayeh Sojoudi

CoRR, 2023

Progressive Knowledge Distillation: Building Ensembles for Efficient Inference.

[DOI]

Don Kurian Dennis

,

Abhishek Shetty

,

,

Kazuhito Koishida

,

CoRR, 2023

Progressive Ensemble Distillation: Building Ensembles for Efficient Inference.

[DOI]

Don Kurian Dennis

,

Abhishek Shetty

,

Anish Prasad Sevekari

,

Kazuhito Koishida

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks.

[DOI]

Vasily Zadorozhnyy

,

,

Kazuhito Koishida

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Toward A Multimodal Approach for Disfluency Detection and Categorization.

[DOI]

,

Kazuhito Koishida

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Low-Latency Mono-Channel Speech Enhancement by Compensation Windows in STFT Analysis.

[DOI]

,

,

Kazuhito Koishida

,

,

Proceedings of the Complex Networks & Their Applications XII, 2023

2022

A Training Framework for Stereo-Aware Speech Enhancement Using Deep Neural Networks.

[DOI]

Bahareh Tolooshams

,

Kazuhito Koishida

Proceedings of the IEEE International Conference on Acoustics, 2022

Training Robust Zero-Shot Voice Conversion Models with Self-Supervised Features.

[DOI]

,

,

,

Kazuhito Koishida

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Augmented Contrastive Self-Supervised Learning for Audio Invariant Representations.

[DOI]

Melikasadat Emami

,

,

Kazuhito Koishida

CoRR, 2021

INTERSPEECH 2021 Deep Noise Suppression Challenge.

[DOI]

Chandan K. A. Reddy

,

Harishchandra Dubey

,

Kazuhito Koishida

,

Arun Asokan Nair

,

,

,

Sebastian Braun

,

,

,

Sriram Srinivasan

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Single-Channel Speech Enhancement Using Learnable Loss Mixup.

[DOI]

,

,

Kazuhito Koishida

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cascaded Time + Time-Frequency Unet For Speech Enhancement: Jointly Addressing Clipping, Codec Distortions, And Gaps.

[DOI]

Arun Asokan Nair

,

Kazuhito Koishida

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Single-Channel Speech Enhancement by Subspace Affinity Minimization.

[DOI]

,

Kazuhito Koishida

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments.

[DOI]

,

Uros Batricevic

,

Kazuhito Koishida

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Low-Latency Single Channel Speech Dereverberation Using U-Net Convolutional Neural Networks.

[DOI]

Ahmet Emin Bulut

,

Kazuhito Koishida

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis.

[DOI]

,

Kazuhito Koishida

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning".

[DOI]

,

,

,

,

Kazuhito Koishida

Proceedings of the 37th International Conference on Machine Learning, 2020

Geometrically Constrained Independent Vector Analysis for Directional Speech Enhancement.

[DOI]

,

Kazuhito Koishida

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

AV(SE)<sup>2</sup>: Audio-Visual Squeeze-Excite Speech Enhancement.

[DOI]

Michael L. Iuzzolino

,

Kazuhito Koishida

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural Networks.

[DOI]

Ahmet Emin Bulut

,

Kazuhito Koishida

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

MMTM: Multimodal Transfer Module for CNN Fusion.

[DOI]

Hamid Reza Vaezi Joze

,

Amirreza Shaban

,

Michael L. Iuzzolino

,

Kazuhito Koishida

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Improved Active Speaker Detection based on Optical Flow.

[DOI]

,

Kazuhito Koishida

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Adversarial Training for Speech Super-Resolution.

[DOI]

Sefik Emre Eskimez

,

Kazuhito Koishida

,

IEEE J. Sel. Top. Signal Process., 2019

Sound Event Detection in Multichannel Audio Using Convolutional Time-Frequency-Channel Squeeze and Excitation.

[DOI]

,

Kazuhito Koishida

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speech Super Resolution Generative Adversarial Network.

[DOI]

Sefik Emre Eskimez

,

Kazuhito Koishida

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings.

[DOI]

,

Kazuhito Koishida

,

John H. L. Hansen

IEEE ACM Trans. Audio Speech Lang. Process., 2018

2017

End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances.

[DOI]

,

Kazuhito Koishida

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

End-to-end text-independent speaker verification with flexibility in utterance duration.

[DOI]

,

Kazuhito Koishida

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2008

Hybrid low bitrate audio coding using adaptive gain shape vector quantization.

[DOI]

Sanjeev Mehrotra

,

,

Kazuhito Koishida

,

Naveen Thumpudi

Proceedings of the International Workshop on Multimedia Signal Processing, 2008

2000

A 1200 bps speech coder based on MELP.

[DOI]

,

Kazuhito Koishida

,

Vladimir Cuperman

,

,

John S. Collura

Proceedings of the IEEE International Conference on Acoustics, 2000

A 16-kbit/s bandwidth scalable audio coder based on the G.729 standard.

[DOI]

Kazuhito Koishida

,

Vladimir Cuperman

,

Proceedings of the IEEE International Conference on Acoustics, 2000

1998

A 16 kbit/s wideband CELP coder using MEL-generalized cepstral analysis and its subjective evaluation.

[DOI]

Kazuhito Koishida

,

Gou Hirabayashi

,

,

Takao Kobayashi

Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

A wideband CELP speech coder at 16 kbit/s based on mel-generalized cepstral analysis.

[DOI]

Kazuhito Koishida

,

Gou Hirabayashi

,

,

Takao Kobayashi

Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997

Efficient encoding of mel-generalized cepstrum for CELP coders.

[DOI]

Kazuhito Koishida

,

,

Takao Kobayashi

,

Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

1996

CELP coding system based on mel-generalized cepstral analysis.

[DOI]

Kazuhito Koishida

,

,

Takao Kobayashi

,

Proceedings of the 4th International Conference on Spoken Language Processing, 1996

1995

CELP coding based on mel-cepstral analysis.

[DOI]

Kazuhito Koishida

,

,

Takao Kobayashi

,

Proceedings of the 1995 International Conference on Acoustics, 1995

1994

Speech coding based on adaptive MEL-cepstral analysis for noisy channels.

[DOI]

Kazuhito Koishida

,

,

Takao Kobayashi

,

Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Loading...