Pengchuan Zhang

According to our database1, Pengchuan Zhang authored at least 62 papers between 2017 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
TLDR: Token-Level Detective Reward Model for Large Vision Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation.
CoRR, 2024

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation.
CoRR, 2024

Revisiting the Role of Language Priors in Vision-Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Learning Video Context as Interleaved Multimodal Sequences.
Proceedings of the Computer Vision - ECCV 2024, 2024

Evaluating Text-to-Visual Generation with Image-to-Text Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Evaluating and Improving Compositional Text-to-Visual Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task.
CoRR, 2023

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning.
CoRR, 2023

VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores.
CoRR, 2023

Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DIME-FM : DIstilling Multimodal and Efficient Foundation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniVTG: Towards Unified Video-Language Temporal Grounding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Parameter-Efficient Model Adaptation for Vision Transformers.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
A Unified Model for Tracking and Image-Video Detection Has More Power.
CoRR, 2022

Parameter-efficient Fine-tuning for Vision Transformers.
CoRR, 2022

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models.
CoRR, 2022

GLIPv2: Unifying Localization and Vision-Language Understanding.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

K-LITE: Learning Transferable Visual Models with External Knowledge.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

3DB: A Framework for Debugging Computer Vision Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Self-supervised Vision Transformers for Representation Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Missingness Bias in Model Debugging.
Proceedings of the Tenth International Conference on Learning Representations, 2022

RegionCLIP: Region-based Language-Image Pretraining.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unified Contrastive Learning in Image-Text-Label Space.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Grounded Language-Image Pre-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

An Empirical Study of Training End-to-End Vision-and-Language Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Florence: A New Foundation Model for Computer Vision.
CoRR, 2021

Image Scene Graph Generation (SGG) Benchmark.
CoRR, 2021

Focal Self-attention for Local-Global Interactions in Vision Transformers.
CoRR, 2021

Out-of-distribution Prediction with Invariant Risk Minimization: The Limitation and An Effective Fix.
CoRR, 2021

VinVL: Making Visual Representations Matter in Vision-Language Models.
CoRR, 2021

Focal Attention for Long-Range Interactions in Vision Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multiscale Invertible Generative Networks for High-Dimensional Bayesian Inference.
Proceedings of the 38th International Conference on Machine Learning, 2021

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Dynamic DETR: End-to-End Object Detection with Dynamic Attention.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

VinVL: Revisiting Visual Representations in Vision-Language Models.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Object-Centric Image Generation from Layouts.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
MiniVLM: A Smaller and Faster Vision-Language Model.
CoRR, 2020

Novel Human-Object Interaction Detection via Adversarial Domain Generalization.
CoRR, 2020

Statistical Adaptive Stochastic Gradient Methods.
CoRR, 2020

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks.
Proceedings of the Computer Vision - ECCV 2020, 2020

MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

2019
A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Using Statistics to Automate Stochastic Optimization.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Understanding the Role of Momentum in Stochastic Gradient Methods.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

TIGEr: Text-to-Image Grounding for Image Caption Evaluation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Object-Driven Text-To-Image Synthesis via Adversarial Training.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

RecurJac: An Efficient Recursive Algorithm for Bounding Jacobian Matrix of Neural Networks and Its Applications.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
An Adaptive Fast Solver for a General Class of Positive Definite Matrices Via Energy Decomposition.
Multiscale Model. Simul., 2018

A bird's-eye view on coherence, and a worm's-eye view on cohesion.
CoRR, 2018

Turbo Learning for CaptionBot and DrawingBot.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

On the Discrimination-Generalization Tradeoff in GANs.
Proceedings of the 6th International Conference on Learning Representations, 2018

AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Exploring the Locally Low Dimensional Structure in Solving Random Elliptic PDEs.
Multiscale Model. Simul., 2017

A Sparse Decomposition of Low Rank Symmetric Positive Semidefinite Matrices.
Multiscale Model. Simul., 2017


  Loading...