We stand with Ukraine

We stand with Ukraine

Pengchuan Zhang

According to our database¹, Pengchuan Zhang authored at least 62 papers between 2017 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

TLDR: Token-Level Detective Reward Model for Large Vision Language Models.

[BibT_eX]

[DOI]

,

,

,

,

Pengchuan Zhang

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Pengchuan Zhang

,

,

CoRR, 2024

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation.

[BibT_eX]

[DOI]

CoRR, 2024

Revisiting the Role of Language Priors in Vision-Language Models.

[BibT_eX]

[DOI]

,

,

,

Pengchuan Zhang

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Learning Video Context as Interleaved Multimodal Sequences.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

Pengchuan Zhang

,

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the Computer Vision - ECCV 2024, 2024

Evaluating Text-to-Visual Generation with Image-to-Text Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Pengchuan Zhang

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Evaluating and Improving Compositional Text-to-Visual Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Pengchuan Zhang

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

,

,

,

CoRR, 2023

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning.

[BibT_eX]

[DOI]

,

,

,

,

,

Pengchuan Zhang

,

Raghuraman Krishnamoorthi

,

,

,

Mohamed Elhoseiny

CoRR, 2023

VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores.

[BibT_eX]

[DOI]

,

,

,

Pengchuan Zhang

,

CoRR, 2023

Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding.

[BibT_eX]

[DOI]

,

Satya Narayan Shukla

,

,

Pengchuan Zhang

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DIME-FM : DIstilling Multimodal and Efficient Foundation Models.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone.

[BibT_eX]

[DOI]

Shraman Pramanick

,

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

,

,

Pengchuan Zhang

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniVTG: Towards Unified Video-Language Temporal Grounding.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

Pengchuan Zhang

,

,

Shraman Pramanick

,

,

Alex Jinpeng Wang

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

,

,

,

,

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding.

[BibT_eX]

[DOI]

,

,

,

Pengchuan Zhang

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Parameter-Efficient Model Adaptation for Vision Transformers.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

A Unified Model for Tracking and Image-Video Detection Has More Power.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

,

,

,

,

Sreya Dutta Roy

,

,

CoRR, 2022

Parameter-efficient Fine-tuning for Vision Transformers.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

,

,

CoRR, 2022

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Pengchuan Zhang

,

CoRR, 2022

GLIPv2: Unifying Localization and Vision-Language Understanding.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

,

,

Liunian Harold Li

,

,

,

,

Jenq-Neng Hwang

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

K-LITE: Learning Transferable Visual Models with External Knowledge.

[BibT_eX]

[DOI]

,

,

,

,

,

Pengchuan Zhang

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models.

[BibT_eX]

[DOI]

,

,

Liunian Harold Li

,

Pengchuan Zhang

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

3DB: A Framework for Debugging Computer Vision Models.

[BibT_eX]

[DOI]

Guillaume Leclerc

,

,

,

,

,

,

Kai Yuanqing Xiao

,

Pengchuan Zhang

,

Shibani Santurkar

,

,

,

Aleksander Madry

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone.

[BibT_eX]

[DOI]

,

Aishwarya Kamath

,

,

Pengchuan Zhang

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Self-supervised Vision Transformers for Representation Learning.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

,

,

,

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

Missingness Bias in Model Debugging.

[BibT_eX]

[DOI]

,

,

,

Pengchuan Zhang

,

,

,

Aleksander Madry

Proceedings of the Tenth International Conference on Learning Representations, 2022

RegionCLIP: Region-based Language-Image Pretraining.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

,

,

,

Liunian Harold Li

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unified Contrastive Learning in Image-Text-Label Space.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Grounded Language-Image Pre-training.

[BibT_eX]

[DOI]

Liunian Harold Li

,

Pengchuan Zhang

,

,

,

,

,

,

,

,

Jenq-Neng Hwang

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

An Empirical Study of Training End-to-End Vision-and-Language Transformers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Pengchuan Zhang

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Florence: A New Foundation Model for Computer Vision.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Pengchuan Zhang

CoRR, 2021

Image Scene Graph Generation (SGG) Benchmark.

[BibT_eX]

[DOI]

,

,

,

,

,

Pengchuan Zhang

CoRR, 2021

Focal Self-attention for Local-Global Interactions in Vision Transformers.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

,

,

,

,

CoRR, 2021

Out-of-distribution Prediction with Invariant Risk Minimization: The Limitation and An Effective Fix.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

,

CoRR, 2021

VinVL: Making Visual Representations Matter in Vision-Language Models.

[BibT_eX]

[DOI]

Pengchuan Zhang

,

,

,

,

,

,

,

CoRR, 2021

Focal Attention for Long-Range Interactions in Vision Transformers.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multiscale Invertible Generative Networks for High-Dimensional Bayesian Inference.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding.

[BibT_eX]

[DOI]

Pengchuan Zhang

,

,

,

,

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Dynamic DETR: End-to-End Object Detection with Dynamic Attention.

[BibT_eX]

[DOI]

,

,

,

Pengchuan Zhang

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

VinVL: Revisiting Visual Representations in Vision-Language Models.

[BibT_eX]

[DOI]

Pengchuan Zhang

,

,

,

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Object-Centric Image Generation from Layouts.

[BibT_eX]

[DOI]

Tristan Sylvain

,

Pengchuan Zhang

,

,

,

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

MiniVLM: A Smaller and Faster Vision-Language Model.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

,

,

,

,

,

CoRR, 2020

Novel Human-Object Interaction Detection via Adversarial Domain Generalization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Pengchuan Zhang

CoRR, 2020

Statistical Adaptive Stochastic Gradient Methods.

[BibT_eX]

[DOI]

Pengchuan Zhang

,

,

,

CoRR, 2020

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks.

[BibT_eX]

[DOI]

,

,

,

Pengchuan Zhang

,

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network.

[BibT_eX]

[DOI]

,

,

,

,

Ming-Ching Chang

,

,

,

Pengchuan Zhang

Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

2019

A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks.

[BibT_eX]

[DOI]

,

,

,

,

Pengchuan Zhang

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers.

[BibT_eX]

[DOI]

,

,

Ilya P. Razenshteyn

,

Pengchuan Zhang

,

,

Sébastien Bubeck

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Using Statistics to Automate Stochastic Optimization.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Understanding the Role of Momentum in Stochastic Gradient Methods.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

TIGEr: Text-to-Image Grounding for Image Caption Evaluation.

[BibT_eX]

[DOI]

,

,

,

,

Pengchuan Zhang

,

,

,

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Object-Driven Text-To-Image Synthesis via Adversarial Training.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

RecurJac: An Efficient Recursive Algorithm for Bounding Jacobian Matrix of Neural Networks and Its Applications.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

An Adaptive Fast Solver for a General Class of Positive Definite Matrices Via Energy Decomposition.

[BibT_eX]

[DOI]

,

,

,

Pengchuan Zhang

Multiscale Model. Simul., 2018

A bird's-eye view on coherence, and a worm's-eye view on cohesion.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

,

,

,

,

CoRR, 2018

Turbo Learning for CaptionBot and DrawingBot.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

Dapeng Oliver Wu

,

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

On the Discrimination-Generalization Tradeoff in GANs.

[BibT_eX]

[DOI]

Pengchuan Zhang

,

,

,

,

Proceedings of the 6th International Conference on Learning Representations, 2018

AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks.

[BibT_eX]

[DOI]

,

Pengchuan Zhang

,

,

,

,

,

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Exploring the Locally Low Dimensional Structure in Solving Random Elliptic PDEs.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

Multiscale Model. Simul., 2017

A Sparse Decomposition of Low Rank Symmetric Positive Semidefinite Matrices.

[BibT_eX]

[DOI]

,

,

Pengchuan Zhang

Multiscale Model. Simul., 2017

Loading...