Kevin J. Shih

According to our database¹, Kevin J. Shih authored at least 36 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Benchmarking Single-Factor Physical Video-to-Audio Generation.

[BibT_eX]

[DOI]

Gopala Anumanchipalli

Ming-Yu Liu

CoRR, May, 2026

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

A2SB: Audio-to-Audio Schrodinger Bridges.

[BibT_eX]

[DOI]

CoRR, January, 2025

Fugatto 1: Foundational Generative Audio Transformer Opus 1.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling.

[BibT_eX]

[DOI]

Nannan Li

Kevin J. Shih

Bryan A. Plummer

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2023

Partial Convolution for Padding, Inpainting, and Image Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

Multilingual Multiaccented Multispeaker TTS with RADTTS.

[BibT_eX]

[DOI]

CoRR, 2023

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

RAD-MMM: Multilingual Multiaccented Multispeaker Text To Speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures.

[BibT_eX]

[DOI]

Nannan Li

Kevin J. Shih

Bryan A. Plummer

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

High-Acoustic Fidelity Text To Speech Synthesis With Fine-Grained Control Of Speech Attributes.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Vani: Very-Lightweight Accent-Controllable TTS for Native And Non-Native Speakers With Identity Preservation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Revisiting Image-Language Networks for Open-Ended Phrase Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows.

[BibT_eX]

[DOI]

CoRR, 2022

One TTS Alignment to Rule Them All.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

2019

Video Interpolation and Prediction with Unsupervised Landmarks.

[BibT_eX]

[DOI]

CoRR, 2019

Graphical Contrastive Losses for Scene Graph Generation.

[BibT_eX]

[DOI]

CoRR, 2019

Unsupervised Video Interpolation Using Cycle Consistency.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Graphical Contrastive Losses for Scene Graph Parsing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Improving Semantic Segmentation via Video Propagation and Label Relaxation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Partial Convolution based Padding.

[BibT_eX]

[DOI]

CoRR, 2018

An Interpretable Model for Scene Graph Generation.

[BibT_eX]

[DOI]

CoRR, 2018

Open-vocabulary Phrase Detection.

[BibT_eX]

[DOI]

CoRR, 2018

SDCNet: Video Prediction Using Spatially-Displaced Convolution.

[BibT_eX]

[DOI]

CoRR, 2018

Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge.

[BibT_eX]

[DOI]

CoRR, 2018

SDC-Net: Video Prediction Using Spatially-Displaced Convolution.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Image Inpainting for Irregular Holes Using Partial Convolutions.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Learning Interpretable Spatial Operations in a Rich 3D Blocks World.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Learning visual tasks with selective attention

[BibT_eX]

[DOI]

Kevin J. Shih

PhD thesis, 2017

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Where to Look: Focus Regions for Visual Question Answering.

[BibT_eX]

[DOI]

Kevin J. Shih

Saurabh Singh

Derek Hoiem

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Learning Discriminative Collections of Part Detectors for Object Recognition.

[BibT_eX]

[DOI]

Kevin J. Shih

Ian Endres

Derek Hoiem

IEEE Trans. Pattern Anal. Mach. Intell., 2015

Part Localization using Multi-Proposal Consensus for Fine-Grained Categorization.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference 2015, 2015

2013

Learning Collections of Part Models for Object Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Kevin J. Shih

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...