Shilei Wen

Orcid: 0009-0009-4746-6928

According to our database¹, Shilei Wen authored at least 68 papers between 2011 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation.

[BibT_eX]

[DOI]

CoRR, April, 2026

Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation.

[BibT_eX]

[DOI]

CoRR, April, 2026

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

JoVA: Unified Multimodal Learning for Joint Video-Audio Generation.

[BibT_eX]

[DOI]

CoRR, December, 2025

OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing.

[BibT_eX]

[DOI]

CoRR, December, 2025

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents.

[BibT_eX]

[DOI]

CoRR, August, 2025

Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks.

[BibT_eX]

[DOI]

CoRR, June, 2025

Discriminator-Free Direct Preference Optimization for Video Diffusion.

[BibT_eX]

[DOI]

CoRR, April, 2025

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

360-VIO: A Robust Visual-Inertial Odometry Using a 360° Camera.

[BibT_eX]

[DOI]

IEEE Trans. Ind. Electron., September, 2024

DiffusionGPT: LLM-Driven Text-to-Image Generation System.

[BibT_eX]

[DOI]

CoRR, 2024

UniFL: Improve Latent Diffusion Model via Unified Feedback Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Outlier-aware Slicing for Post-Training Quantization in Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

AffineQuant: Affine Transformation Quantization for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction.

[BibT_eX]

[DOI]

Congyi Wang

Feida Zhu

Shilei Wen

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Semi-Supervised Temporal Action Proposal Generation via Exploiting 2-D Proposal Map.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

Purely Attention Based Local Feature Integration for Video Classification.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

2021

VSRNet: End-to-end video segment retrieval with text query.

[BibT_eX]

[DOI]

Pattern Recognit., 2021

Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones.

[BibT_eX]

[DOI]

CoRR, 2021

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

TPM: Multiple object tracking with tracklet-plane matching.

[BibT_eX]

[DOI]

Pattern Recognit., 2020

Coherent Loss: A Generic Framework for Stable Video Segmentation.

[BibT_eX]

[DOI]

CoRR, 2020

PP-YOLO: An Effective and Efficient Implementation of Object Detector.

[BibT_eX]

[DOI]

CoRR, 2020

PointTrack++ for Effective Online Multi-Object Tracking and Segmentation.

[BibT_eX]

[DOI]

CoRR, 2020

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Modularized Framework with Category-Sensitive Abnormal Filter for City Anomaly Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Deep Concept-wise Temporal Convolutional Networks for Action Localization.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

HANet: Hybrid Attention-aware Network for Crowd Counting.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Monocular 3D Object Detection via Feature Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Segment as Points for Efficient Online Multi-Object Tracking and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

AIM 2020 Challenge on Image Extreme Inpainting.

[BibT_eX]

[DOI]

Pranjal Singh Chauhan

Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Going Beyond Real Data: A Robust Visual Representation for Vehicle Re-identification.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Robust Movement-Specific Vehicle Counting at Crowded Intersections.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Multi-Granularity Tracking with Modularlized Components for Unsupervised Vehicles Anomaly Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Multi-Label Classification with Label Graph Superimposing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Dynamic Instance Normalization for Arbitrary Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation.

[BibT_eX]

[DOI]

CoRR, 2019

Perspective-Guided Convolution Networks for Crowd Counting.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Image Inpainting With Learnable Bidirectional Attention Maps.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

BMN: Boundary-Matching Network for Temporal Action Proposal Generation.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Multi-camera vehicle tracking and re-identification based on visual and spatial-temporal features.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results.

[BibT_eX]

[DOI]

Rudrabha Mukhopadhyay

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Adapting Image Super-Resolution State-Of-The-Arts and Learning Multi-Model Ensemble for Video Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

NTIRE 2019 Challenge on Real Image Super-Resolution: Methods and Results.

[BibT_eX]

[DOI]

Pablo Navarrete Michelini

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete Annotation and Data Imbalance.

[BibT_eX]

[DOI]

CoRR, 2018

Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2018

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Multimodal Keyless Attention Fusion for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification.

[BibT_eX]

[DOI]

CoRR, 2017

Dynamic Computational Time for Visual Attention.

[BibT_eX]

[DOI]

CoRR, 2017

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2017

Dynamic Computational Time for Visual Attention.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Deep Metric Learning with Angular Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2011

Compounded Face Image Retrieval Based on Vertical Web Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Sixth Chinagrid Annual Conference, ChinaGrid 2011, Dalian, Liaoning, 2011

Shilei Wen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...