Xiangyu Zhang

Orcid: 0000-0003-2138-4608

Affiliations:
  • Megvii Inc., Beijing, China
  • Xi'an Jiaotong University, Department of Electrical Engineering, China (PhD 2017)
  • Microsoft Research Asia, China (former)


According to our database1, Xiangyu Zhang authored at least 181 papers between 2012 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Bootstrap Masked Visual Modeling via Hard Patch Mining.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2025

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale.
CoRR, August, 2025

StepFun-Prover Preview: Let's Think and Verify Step by Step.
CoRR, July, 2025

Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation.
CoRR, July, 2025

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning.
CoRR, July, 2025

Hita: Holistic Tokenizer for Autoregressive Image Generation.
CoRR, July, 2025

Anchor Attention, Small Cache: Code Generation With Large Language Models.
IEEE Trans. Software Eng., June, 2025

Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?
CoRR, June, 2025

Farseer: A Refined Scaling Law in Large Language Models.
CoRR, June, 2025

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model.
CoRR, June, 2025

NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models.
CoRR, June, 2025

Is Compression Really Linear with Code Intelligence?
CoRR, May, 2025

Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets.
CoRR, May, 2025

PADriver: Towards Personalized Autonomous Driving.
CoRR, May, 2025

Step1X-Edit: A Practical Framework for General Image Editing.
CoRR, April, 2025

Perception-R1: Pioneering Perception Policy with Reinforcement Learning.
CoRR, April, 2025

Perception in Reflection.
CoRR, April, 2025

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness.
CoRR, April, 2025

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model.
CoRR, March, 2025

M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
CoRR, March, 2025

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model.
CoRR, March, 2025

Predictable Scale: Part I - Optimal Hyperparameter Scaling Law in Large Language Model Pretraining.
CoRR, March, 2025

Unhackable Temporal Rewarding for Scalable Video MLLMs.
CoRR, February, 2025

PerPO: Perceptual Preference Optimization via Discriminative Rewarding.
CoRR, February, 2025

Taming Teacher Forcing for Masked Autoregressive Video Generation.
CoRR, January, 2025

Assessing and improving syntactic adversarial robustness of pre-trained models for code translation.
Inf. Softw. Technol., 2025

Unhackable Temporal Reward for Scalable Video MLLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Glad: A Streaming Scene Generator for Autonomous Driving.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Reconstructive Visual Instruction Tuning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Taming Teacher Forcing for Masked Autoregressive Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Multi-matrix Factorization Attention.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Language Prompt for Autonomous Driving.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
GroupLane: End-to-End 3D Lane Detection With Channel-Wise Grouping.
IEEE Robotics Autom. Lett., November, 2024

Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models.
IEEE Trans. Software Eng., September, 2024

Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception.
IEEE Robotics Autom. Lett., July, 2024

Context-aware code generation with synchronous bidirectional decoder.
J. Syst. Softw., 2024

Slow Perception: Let's Perceive Geometric Figures Step-by-step.
CoRR, 2024

Less is More: Towards Green Code Large Language Models via Unified Structural Pruning.
CoRR, 2024

Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models.
CoRR, 2024

Less is More: DocString Compression in Code Generation.
CoRR, 2024

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.
CoRR, 2024

Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving.
CoRR, 2024

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks.
CoRR, 2024

CodeScore-R: An Automated Robustness Metric for Assessing the FunctionalCorrectness of Code Synthesis.
CoRR, 2024

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?
CoRR, 2024

Focus Anywhere for Fine-grained Multi-page Document Understanding.
CoRR, 2024

Small Language Model Meets with Reinforced Vision Vocabulary.
CoRR, 2024

Stream Query Denoising for Vectorized HD Map Construction.
CoRR, 2024

Self-Supervised Visual Preference Alignment.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DreamLLM: Synergistic Multimodal Comprehension and Creation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Merlin: Empowering Multimodal LLMs with Foresight Minds.
Proceedings of the Computer Vision - ECCV 2024, 2024

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

Stream Query Denoising for Vectorized HD-Map Construction.
Proceedings of the Computer Vision - ECCV 2024, 2024

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss.
Proceedings of the 35th British Machine Vision Conference, 2024

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Far3D: Expanding the Horizon for Surround-View 3D Object Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

DDAE: Towards Deep Dynamic Vision BERT Pretraining.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
A syntax-guided multi-task learning approach for Turducken-style code generation.
Empir. Softw. Eng., November, 2023

Scale-Aware Automatic Augmentations for Object Detection With Dynamic Training.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

ExploitGen: Template-augmented exploit code generation based on CodeBERT.
J. Syst. Softw., 2023

Bootstrap Masked Visual Modeling via Hard Patches Mining.
CoRR, 2023

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models.
CoRR, 2023

ADriver-I: A General World Model for Autonomous Driving.
CoRR, 2023

Language Prompt for Autonomous Driving.
CoRR, 2023

MOTRv3: Release-Fetch Supervision for End-to-End Multi-Object Tracking.
CoRR, 2023

Self-supervised Learning by View Synthesis.
CoRR, 2023

Align-DETR: Improving DETR with Simple IoU-aware BCE loss.
CoRR, 2023

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection.
CoRR, 2023

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining.
Proceedings of the International Conference on Machine Learning, 2023

Re-parameterizing Your Optimizers rather than Architectures.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Reversible Column Networks.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Cross Modal Transformer: Towards Fast and Robust 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Syntax-Aware Retrieval Augmented Code Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Understanding Imbalanced Semantic Segmentation Through Neural Collapse.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Differentiable Architecture Search with Random Features.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Referring Multi-Object Tracking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Understanding Masked Image Modeling via Learning Occlusion Invariant Feature.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Weight-Dependent Gates for Network Pruning.
IEEE Trans. Circuits Syst. Video Technol., 2022

PointINS: Point-Based Instance Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Towards 3D Object Detection with 2D Supervision.
CoRR, 2022

The 1st-place Solution for ECCV 2022 Multiple People Tracking in Group Dance Challenge.
CoRR, 2022

Scaling up Kernels in 3D CNNs.
CoRR, 2022

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images.
CoRR, 2022

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs.
CoRR, 2022

Self-Supervised Visual Representation Learning with Semantic Grouping.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MOTR: End-to-End Multiple-Object Tracking with Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

PETR: Position Embedding Transformation for Multi-view 3D Object Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

Revisiting the Critical Factors of Augmentation-Invariant Representation Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Simple Baselines for Image Restoration.
Proceedings of the Computer Vision - ECCV 2022, 2022

Progressive End-to-End Object Detection in Crowded Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Relieving Long-tailed Instance Segmentation via Pairwise Class Balance.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Focal Sparse Convolutional Networks for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LGD: Label-Guided Self-Distillation for Object Detection.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Anchor DETR: Query Design for Transformer-Based Detector.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Joint Multi-Dimension Pruning via Numerical Gradient Update.
IEEE Trans. Image Process., 2021

On Efficient Transformer and Image Pre-training for Low-level Vision.
CoRR, 2021

Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better.
CoRR, 2021

Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report.
CoRR, 2021

MOTR: End-to-End Multiple-Object Tracking with TRansformer.
CoRR, 2021

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition.
CoRR, 2021

Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Instance-Conditional Knowledge Distillation for Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SOLQ: Segmenting Objects by Learning Queries.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Implicit Feature Refinement for Instance Segmentation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Image Synthesis via Semantic Composition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Neural Architecture Search With Random Labels.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Activate or Not: Learning Customized Activation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

RepVGG: Making VGG-Style ConvNets Great Again.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Diverse Branch Block: Building a Convolution as an Inception-Like Unit.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

You Only Look One-Level Feature.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dynamic Region-Aware Convolution.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Points As Queries: Weakly Semi-Supervised Object Detection by Points.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Implicit Feature Pyramid Network for Object Detection.
CoRR, 2020

Joint COCO and Mapillary Workshop at ICCV 2019: COCO Instance Segmentation Challenge Track.
CoRR, 2020

EqCo: Equivalent Rules for Self-supervised Contrastive Learning.
CoRR, 2020

Activate or Not: Learning Customized Activation.
CoRR, 2020

Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay.
CoRR, 2020

Joint Multi-Dimension Pruning.
CoRR, 2020

Stitcher: Feedback-driven Data Provider for Object Detection.
CoRR, 2020

PointINS: Point-based Instance Segmentation.
CoRR, 2020

Rethinking Learnable Tree Filter for Generic Feature Transform.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization.
Proceedings of the 8th International Conference on Learning Representations, 2020

Funnel Activation for Visual Recognition.
Proceedings of the Computer Vision - ECCV 2020, 2020

WeightNet: Revisiting the Design Space of Weight Networks.
Proceedings of the Computer Vision - ECCV 2020, 2020

Weight-Dependent Gates for Differentiable Neural Network Pruning.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Angle-Based Search Space Shrinking for Neural Architecture Search.
Proceedings of the Computer Vision - ECCV 2020, 2020

LabelEnc: A New Intermediate Supervision Method for Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

Single Path One-Shot Neural Architecture Search with Uniform Sampling.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning Delicate Local Representations for Multi-person Pose Estimation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning Human-Object Interaction Detection Using Interaction Points.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Attentive Normalization for Conditional Image Generation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Dynamic Routing for Semantic Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Detection in Crowded Scenes: One Proposal, Multiple Predictions.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
DetNAS: Neural Architecture Search on Object Detection.
CoRR, 2019

DetNAS: Backbone Search for Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Objects365: A Large-Scale, High-Quality Dataset for Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Meta-SR: A Magnification-Arbitrary Network for Super-Resolution.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Bounding Box Regression With Uncertainty for Accurate Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection.
CoRR, 2018

MetaAnchor: Learning to Detect Objects with Customized Anchors.
CoRR, 2018

CrowdHuman: A Benchmark for Detecting Human in a Crowd.
CoRR, 2018

DetNet: A Backbone network for Object Detection.
CoRR, 2018

ExFuse: Enhancing Feature Fusion for Semantic Segmentation.
CoRR, 2018

MetaAnchor: Learning to Detect Objects with Customized Anchors.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

ExFuse: Enhancing Feature Fusion for Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2018, 2018

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design.
Proceedings of the Computer Vision - ECCV 2018, 2018

DetNet: Design Backbone for Object Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

MegDet: A Large Mini-Batch Object Detector.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Object Detection Networks on Convolutional Feature Maps.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Light-Head R-CNN: In Defense of Two-Stage Object Detector.
CoRR, 2017

Channel Pruning for Accelerating Very Deep Neural Networks.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Large Kernel Matters - Improve Semantic Segmentation by Global Convolutional Network.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Accelerating Very Deep Convolutional Networks for Classification and Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

Identity Mappings in Deep Residual Networks.
Proceedings of the Computer Vision - ECCV 2016, 2016

Deep Residual Learning for Image Recognition.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Efficient and accurate approximations of nonlinear convolutional networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Toward Concurrent Lock-Free Queues on GPUs.
IEICE Trans. Inf. Syst., 2014

2012
Interconnection of wind farms with grid using a MTDC network.
Proceedings of the 38th Annual Conference on IEEE Industrial Electronics Society, 2012


  Loading...