We stand with Ukraine

We stand with Ukraine

Yi Zhu

Orcid: 0000-0002-6482-6712

Affiliations:

Amazon
University of California, Merced, USA (PhD 2019)

According to our database¹, Yi Zhu authored at least 66 papers between 2015 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org
on dl.acm.org

On csauthors.net:

Bibliography

2026

Back to Basics: Revisiting ASR in the Age of Voice Agents.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2026

Multi-dimensional Assessment and Explainable Feedback for Counselor Responses to Client Resistance in Text-based Counseling with LLMs.

[DOI]

,

,

,

,

,

,

,

CoRR, February, 2026

2025

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation.

[DOI]

,

,

,

,

,

,

CoRR, February, 2025

RPGBENCH: Evaluating Large Language Models as Role-Playing Game Engines.

[DOI]

,

,

,

,

,

Andrea Yaoyun Cui

,

,

,

,

,

CoRR, February, 2025

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Improving Semantic Segmentation via Efficient Self-Training.

[DOI]

,

,

,

,

,

,

,

,

Alexander J. Smola

IEEE Trans. Pattern Anal. Mach. Intell., March, 2024

Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift.

[DOI]

,

,

,

,

,

,

,

J. Data-centric Mach. Learn. Res., 2024

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

2023

What Makes for Good Tokenizers in Vision Transformer?

[DOI]

,

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

GFM: Building Geospatial Foundation Models via Continual Pretraining.

[DOI]

Matías Mendieta

,

,

,

,

,

CoRR, 2023

SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation.

[DOI]

,

,

,

Srikar Appalaraju

,

CoRR, 2023

ImpDet: Exploring Implicit Fields for 3D Object Detection.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

MixGen: A New Multi-Modal Data Augmentation.

[DOI]

,

,

Srikar Appalaraju

,

,

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2023

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition.

[DOI]

,

,

,

,

,

,

Alexander J. Smola

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PreDiff: Precipitation Nowcasting with Latent Diffusion Models.

[DOI]

,

,

,

,

,

Danielle C. Maddix

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

AIM: Adapting Image Models for Efficient Video Action Recognition.

[DOI]

Taojiannan Yang

,

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Towards Geospatial Foundation Models via Continual Pretraining.

[DOI]

Matías Mendieta

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Motion-Guided Masking for Spatiotemporal Representation Learning.

[DOI]

,

,

,

,

,

Hector J. Santos-Villalobos

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation.

[DOI]

,

,

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

BiCSNet: A Bidirectional Cross-Scale Backbone for Recognition and Localization.

[DOI]

,

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., 2022

SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning.

[DOI]

,

,

,

,

,

,

CoRR, 2022

Are Multimodal Models Robust to Image and Text Perturbations?

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Visual Prompt Tuning for Test-time Domain Adaptation.

[DOI]

,

,

,

,

,

,

,

Dimitris N. Metaxas

CoRR, 2022

NUTA: Non-uniform Temporal Aggregation for Action Recognition.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Earthformer: Exploring Space-Time Transformers for Earth System Forecasting.

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition.

[DOI]

,

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2022

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

ResNeSt: Split-Attention Networks.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Alexander J. Smola

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

2021

AutoAdapt: Automated Segmentation Network Search for Unsupervised Domain Adaptation.

[DOI]

,

,

,

Shawn D. Newsam

CoRR, 2021

SelfNorm and CrossNorm for Out-of-Distribution Robustness.

[DOI]

,

,

,

,

,

Dimitris N. Metaxas

CoRR, 2021

Scale Aware Adaptation for Land-Cover Classification in Remote Sensing Imagery.

[DOI]

,

,

,

Shawn D. Newsam

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Progressive Coordinate Transforms for Monocular 3D Object Detection.

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Blending Anti-Aliasing into Vision Transformer.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Unified Efficient Pyramid Transformer for Semantic Segmentation.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Video Contrastive Learning with Global Context.

[DOI]

,

,

,

,

,

Sören Schwertfeger

,

Cyrill Stachniss

,

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

VidTr: Video Transformer Without Convolutions.

[DOI]

,

,

,

,

,

Biagio Brattoli

,

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CrossNorm and SelfNorm for Generalization under Distribution Shifts.

[DOI]

,

,

,

,

,

Dimitris N. Metaxas

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

J. Mach. Learn. Res., 2020

A Comprehensive Study of Deep Video Action Recognition.

[DOI]

,

,

,

Mohammadreza Zolfaghari

,

,

,

,

,

,

CoRR, 2020

Improving Semantic Segmentation via Self-Training.

[DOI]

,

,

,

,

,

,

,

,

Alexander J. Smola

CoRR, 2020

Cross-Time and Orientation-Invariant Overhead Image Geolocalization Using Deep Local Features.

[DOI]

,

,

,

Shawn D. Newsam

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

2019

Exploring Temporal Information for Improved Video Understanding.

[DOI]

PhD thesis, 2019

Fine-Grained Land Use Classification at the City Scale Using Ground-Level Images.

[DOI]

,

,

Shawn D. Newsam

IEEE Trans. Multim., 2019

Generalizing Deep Models for Overhead Image Segmentation Through Getis-Ord Gi* Pooling.

[DOI]

,

,

,

Shawn D. Newsam

CoRR, 2019

Exploring Temporal Information for Improved Video Understanding.

[DOI]

CoRR, 2019

Using Conditional Generative Adversarial Networks to Generate Ground-Level Views From Overhead Imagery.

[DOI]

,

,

Shawn D. Newsam

CoRR, 2019

Improving Semantic Segmentation via Video Propagation and Label Relaxation.

[DOI]

,

,

,

,

Shawn D. Newsam

,

,

Bryan Catanzaro

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Motion-Aware Feature for Improved Video Anomaly Detection.

[DOI]

,

Shawn D. Newsam

Proceedings of the 30th British Machine Vision Conference 2019, 2019

2018

Learning Optical Flow via Dilated Networks and Occlusion Reasoning.

[DOI]

,

Shawn D. Newsam

Proceedings of the 2018 IEEE International Conference on Image Processing, 2018

Spatial Morphing Kernel Regression for Feature Interpolation.

[DOI]

,

,

Shawn D. Newsam

Proceedings of the 2018 IEEE International Conference on Image Processing, 2018

What is it like down there?: generating dense ground-level views and image features from overhead imagery using conditional generative adversarial networks.

[DOI]

,

,

Shawn D. Newsam

Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2018

Towards Universal Representation for Unseen Action Recognition.

[DOI]

,

,

,

Shawn D. Newsam

,

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Gated Transfer Network for Transfer Learning.

[DOI]

,

,

Shawn D. Newsam

Proceedings of the Computer Vision - ACCV 2018, 2018

Random Temporal Skipping for Multirate Video Analysis.

[DOI]

,

Shawn D. Newsam

Proceedings of the Computer Vision - ACCV 2018, 2018

Hidden Two-Stream Convolutional Networks for Action Recognition.

[DOI]

,

,

Shawn D. Newsam

,

Alexander G. Hauptmann

Proceedings of the Computer Vision - ACCV 2018, 2018

2017

UC Merced Submission to the ActivityNet Challenge 2016.

[DOI]

,

Shawn D. Newsam

,

CoRR, 2017

Guided Optical Flow Learning.

[DOI]

,

,

Shawn D. Newsam

,

Alexander G. Hauptmann

CoRR, 2017

Large-Scale Human Activity Mapping using Geo-Tagged Videos.

[DOI]

,

,

Shawn D. Newsam

CoRR, 2017

Deep Local Video Feature for Action Recognition.

[DOI]

,

,

Alexander G. Hauptmann

CoRR, 2017

Efficient Action Detection in Untrimmed Videos via Multi-task Learning.

[DOI]

,

Shawn D. Newsam

Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision, 2017

DenseNet for dense flow.

[DOI]

,

Shawn D. Newsam

Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Large-Scale Mapping of Human Activity using Geo-Tagged Videos.

[DOI]

,

,

Shawn D. Newsam

Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2017

Deep Local Video Feature for Action Recognition.

[DOI]

,

,

Alexander G. Hauptmann

,

Shawn D. Newsam

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

2016

Spatio-temporal sentiment hotspot detection using geotagged photos.

[DOI]

,

Shawn D. Newsam

Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2016, Burlingame, California, USA, October 31, 2016

Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition.

[DOI]

,

Shawn D. Newsam

Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

2015

Land use classification using convolutional neural networks applied to ground-level images.

[DOI]

,

Shawn D. Newsam

Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2015

Loading...