Wenhai Wang

Orcid: 0000-0002-1936-2840

According to our database1, Wenhai Wang authored at least 217 papers between 1999 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook.
ACM Comput. Surv., November, 2025

Pose-Guided Transformer for Fine-Grained Action Quality Assessment.
IEEE Trans. Circuits Syst. Video Technol., August, 2025

ORFuzz: Fuzzing the "Other Side" of LLM Safety - Testing Over-Refusal.
CoRR, August, 2025

CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation.
CoRR, July, 2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents.
CoRR, July, 2025

ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding.
CoRR, July, 2025

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning.
CoRR, July, 2025

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.
CoRR, July, 2025

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.
CoRR, June, 2025

CoMemo: LVLMs Need Image Context with Image Memory.
CoRR, June, 2025

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis.
CoRR, June, 2025

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces.
CoRR, June, 2025

Scuzer: A Scheduling Optimization Fuzzer for TVM.
ACM Trans. Softw. Eng. Methodol., May, 2025

KG4RecEval: Does Knowledge Graph Really Matter for Recommender Systems?
ACM Trans. Inf. Syst., May, 2025

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models.
CoRR, May, 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost.
CoRR, May, 2025

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings.
CoRR, May, 2025

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows.
CoRR, May, 2025

Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity.
CoRR, May, 2025

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning.
CoRR, May, 2025

Recurrent Neural Unit With Frequency Attention for Specific Emitter Identification.
IEEE Trans. Cogn. Commun. Netw., April, 2025

Demystify Transformers & Convolutions in Modern Image Deep Networks.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models.
CoRR, April, 2025

Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR.
CoRR, April, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
CoRR, April, 2025

BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

LuaTaint: A Static Analysis System for Web Configuration Interface Vulnerability of Internet of Things Devices.
IEEE Internet Things J., March, 2025

ArchCAD-400K: An Open Large-Scale Architectural CAD Dataset and New Baseline for Panoptic Symbol Spotting.
CoRR, March, 2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework.
CoRR, March, 2025

ModiGen: A Large Language Model-Based Workflow for Multi-Task Modelica Code Generation.
CoRR, March, 2025

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning.
CoRR, March, 2025

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning.
CoRR, March, 2025

LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness.
CoRR, February, 2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.
CoRR, February, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.
CoRR, January, 2025

Dynamic cross-layer security risk assessment and mitigation for cyber-physical power systems.
Reliab. Eng. Syst. Saf., 2025

S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models.
Proc. ACM Softw. Eng., 2025

Beyond Static Pattern Matching? Rethinking Automatic Cryptographic API Misuse Detection in the Era of LLMs.
Proc. ACM Softw. Eng., 2025

BinEGA: Enhancing DNN-based Binary Code Similarity Detection through Efficient Graph Alignment.
Proceedings of the IEEE International Conference on Software Analysis, 2025

MQueez: Specification-Driven Fuzzing for MQTT Broker (Registered Report).
Proceedings of the 34th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2025

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Docopilot: Improving Multimodal Models for Document-Level Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Sticking to the Mean: Detecting Sticky Tokens in Text Embedding Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Denoising Diffusion Straightforward Models for Energy Conversion Monitoring Data Imputation.
IEEE Trans. Ind. Informatics, October, 2024

Causality Enhanced Global-Local Graph Neural Network for Bioprocess Factor Forecasting.
IEEE Trans. Ind. Informatics, October, 2024

VLG: General Video Recognition with Web Textual Knowledge.
Int. J. Comput. Vis., October, 2024

From Coarse to Fine: Hierarchical Zero-Shot Fault Diagnosis With Multigrained Attributes.
IEEE Trans. Fuzzy Syst., May, 2024

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Security-Enhanced Operational Architecture for Decentralized Industrial Internet of Things: A Blockchain-Based Approach.
IEEE Internet Things J., March, 2024

Gaussian dynamic recurrent unit for emitter classification.
Expert Syst. Appl., March, 2024

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.
Vis. Intell., 2024

Canonical Variate Analysis for Detecting False Data Injection Attacks in Alternating Current State Estimation.
IEEE Trans. Netw. Sci. Eng., 2024

Feature Selection Based on Intrusive Outliers Rather Than All Instances.
IEEE Trans. Image Process., 2024

Fast Fourier Transform With Multihead Attention for Specific Emitter Identification.
IEEE Trans. Instrum. Meas., 2024

DTIN: Dual Transformer-based Imputation Nets for multivariate time series emitter missing data.
Knowl. Based Syst., 2024

Bayesian and stochastic game joint approach for Cross-Layer optimal defensive Decision-Making in industrial Cyber-Physical systems.
Inf. Sci., 2024

DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction.
CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024

MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost.
CoRR, 2024

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.
CoRR, 2024

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.
CoRR, 2024

Agents4PLC: Automating Closed-loop PLC Code Generation and Verification in Industrial Control Systems using LLM-based Agents.
CoRR, 2024

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding.
CoRR, 2024

Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori.
CoRR, 2024

Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks.
CoRR, 2024

Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization.
CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024

LLMs Meet Multimodal Generation and Editing: A Survey.
CoRR, 2024

S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models.
CoRR, 2024

Does Knowledge Graph Really Matter for Recommender Systems?
CoRR, 2024

LuaTaint: A Static Taint Analysis System for Web Interface Framework Vulnerability of IoT Devices.
CoRR, 2024

FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution.
CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.
CoRR, 2024

Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion.
CoRR, 2024

FAMCF: A few-shot Android malware family classification framework.
Comput. Secur., 2024

MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity.
Sci. China Inf. Sci., 2024

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.
Sci. China Inf. Sci., 2024

Statistical knowledge and game-theoretic integrated model for cross-layer impact assessment in industrial cyber-physical systems.
Adv. Eng. Informatics, 2024

LCG-YOLO: A Real-Time Surface Defect Detection Method for Metal Components.
IEEE Access, 2024

Critical Code Guided Directed Greybox Fuzzing for Commits.
Proceedings of the 33rd USENIX Security Symposium, 2024

Exploring ChatGPT's Capabilities on Vulnerability Management.
Proceedings of the 33rd USENIX Security Symposium, 2024

SyzTrust: State-aware Fuzzing on Trusted OS Designed for IoT Devices.
Proceedings of the IEEE Symposium on Security and Privacy, 2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Needle In A Multimodal Haystack.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Fault Diagnosis of Blast Furnace Throat Temperature Monitoring Device Based on Residual Analysis.
Proceedings of the IEEE International Instrumentation and Measurement Technology Conference, 2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World.
Proceedings of the Computer Vision - ECCV 2024, 2024

ControlLLM: Augment Language Models with Tools by Searching on Graphs.
Proceedings of the Computer Vision - ECCV 2024, 2024

Distilling Knowledge from Large-Scale Image Models for Object Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AVSegFormer: Audio-Visual Segmentation with Transformer.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
A traffic anomaly detection approach based on unsupervised learning for industrial cyber-physical system.
Knowl. Based Syst., November, 2023

Cloud-edge coordinated traffic anomaly detection for industrial cyber-physical systems.
Expert Syst. Appl., November, 2023

A novel mesh discretization strategy for numerical solution of optimal control problems in aerospace engineering.
J. Frankl. Inst., September, 2023

A novel radar operating mode identification approach based on variational relevance vector machine with chaotic gravitational search optimization.
Trans. Inst. Meas. Control, May, 2023

Generalized Focal Loss: Towards Efficient Representation Learning for Dense Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

A Novel Time-Domain Graph Tensor Attention Network for Specific Emitter Identification.
IEEE Trans. Instrum. Meas., 2023

BSMD: A blockchain-based secure storage mechanism for big spatio-temporal data.
Future Gener. Comput. Syst., 2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
CoRR, 2023

A Survey of Reasoning with Foundation Models.
CoRR, 2023

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving.
CoRR, 2023

Prompting Frameworks for Large Language Models: A Survey.
CoRR, 2023

How ChatGPT is Solving Vulnerability Management Problem.
CoRR, 2023

ControlLLM: Augment Language Models with Tools by Searching on Graphs.
CoRR, 2023

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models.
CoRR, 2023

AVSegFormer: Audio-Visual Segmentation with Transformer.
CoRR, 2023

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling.
CoRR, 2023

VideoChat: Chat-Centric Video Understanding.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

A Survey of Historical Learning: Learning Models with Learning History.
CoRR, 2023

Champion Solution for the WSDM2023 Toloka VQA Challenge.
CoRR, 2023

How IoT Re-using Threatens Your Sensitive Data: Exploring the User-Data Disposal in Used IoT Devices.
Proceedings of the 44th IEEE Symposium on Security and Privacy, 2023

RFT: Toward Highly Reliable Flow Data Transmission in Network Measurement.
Proceedings of the 20th Annual IEEE International Conference on Sensing, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Vision Transformer Adapter for Dense Predictions.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

FB-BEV: BEV Representation from Forward-Backward View Transformations.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Applying Rely-Guarantee Reasoning on Concurrent Memory Management and Mailbox in μC/OS-II: A Case Study.
Proceedings of the Formal Methods for Industrial Critical Systems, 2023

Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection.
Proceedings of the Computer Security - ESORICS 2023, 2023

CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Planning-oriented Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Optimal Stealthy Attack to Remote Estimator for Estimation Error Regulation.
Proceedings of the American Control Conference, 2023

2022
A Novel Aggregated Multipath Extreme Gradient Boosting Approach for Radar Emitter Classification.
IEEE Trans. Ind. Electron., 2022

Adversarial Malicious Encrypted Traffic Detection Based on Refined Session Analysis.
Symmetry, 2022

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Density Peak Clustering with connectivity estimation.
Knowl. Based Syst., 2022

On Efficient Reinforcement Learning for Full-length Game of StarCraft II.
J. Artif. Intell. Res., 2022

A novel locality-sensitive hashing relational graph matching network for semantic textual similarity measurement.
Expert Syst. Appl., 2022

PVT v2: Improved baselines with Pyramid Vision Transformer.
Comput. Vis. Media, 2022

Goal-oriented Autonomous Driving.
CoRR, 2022

Demystify Transformers & Convolutions in Modern Image Deep Networks.
CoRR, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.
CoRR, 2022

Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality.
CoRR, 2022

Hybrid Cloud-Edge Collaborative Data Anomaly Detection in Industrial Sensor Networks.
CoRR, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers.
CoRR, 2022

WegFormer: Transformers for Weakly Supervised Semantic Segmentation.
CoRR, 2022

Deep weighted joint distribution adaption network for fault diagnosis of blast furnace ironmaking process.
Comput. Chem. Eng., 2022

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Incremental Few-Shot Semantic Segmentation via Embedding Adaptive-Update and Hyper-class Representation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SLIME: program-sensitive energy allocation for fuzzing.
Proceedings of the ISSTA '22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18, 2022

Polygon-Free: Unconstrained Scene Text Detection with Box Annotations.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.
CoRR, 2021

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation.
CoRR, 2021

ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter.
CoRR, 2021

Panoptic SegFormer.
CoRR, 2021

An empirical evaluation of attention-based multi-head models for improved turbofan engine remaining useful life prediction.
CoRR, 2021

Learning Class-level Prototypes for Few-shot Learning.
CoRR, 2021

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers.
CoRR, 2021

PVTv2: Improved Baselines with Pyramid Vision Transformer.
CoRR, 2021

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.
CoRR, 2021

An Introduction of mini-AlphaStar.
CoRR, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.
CoRR, 2021

Trans2Seg: Transparent Object Segmentation with Transformer.
CoRR, 2021

A novel adaptive generic model control strategy for internal thermally coupled air separation columns with multivariable recursive estimation.
Comput. Chem. Eng., 2021

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

IFIZZ: Deep-State and Efficient Fault-Scenario Generation to Test IoT Firmware.
Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering, 2021

Segmenting Transparent Objects in the Wild with Transformer.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine.
Knowl. Based Syst., 2020

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training.
CoRR, 2020

False Data Injection Attacks and Corresponding Countermeasure in DC Microgrid.
CoRR, 2020

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Guided Refine-Head for Object Detection.
Proceedings of the MultiMedia Modeling - 26th International Conference, 2020

TK-Text: Multi-shaped Scene Text Detection via Instance Segmentation.
Proceedings of the MultiMedia Modeling - 26th International Conference, 2020

Segmenting Transparent Objects in the Wild.
Proceedings of the Computer Vision - ECCV 2020, 2020

Scene Text Image Super-Resolution in the Wild.
Proceedings of the Computer Vision - ECCV 2020, 2020

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting.
Proceedings of the Computer Vision - ECCV 2020, 2020

Differentiable Hierarchical Graph Grouping for Multi-person Pose Estimation.
Proceedings of the Computer Vision - ECCV 2020, 2020

PolarMask: Single Shot Instance Segmentation With Polar Representation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
TextSR: Content-Aware Text Super-Resolution Guided by Recognition.
CoRR, 2019

Shape Robust Text Detection with Progressive Scale Expansion Network.
CoRR, 2019

Cropout: A General Mechanism for Reducing Overfitting on Convolutional Neural Networks.
Proceedings of the International Joint Conference on Neural Networks, 2019

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Shape Robust Text Detection With Progressive Scale Expansion Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Selective Kernel Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Nonzero-Dynamics Stealthy Attack and Its Impacts Analysis in DC Microgrids.
Proceedings of the 2019 American Control Conference, 2019

2018
Shape Robust Text Detection with Progressive Scale Expansion Network.
CoRR, 2018

Hand Pose Estimation with Attention-and-Sequence Network.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

A Novel 3D Human Action Recognition Framework for Video Content Analysis.
Proceedings of the MultiMedia Modeling - 24th International Conference, 2018

Cloud of Line Distribution and Random Forest Based Text Detection from Natural/Video Scene Images.
Proceedings of the MultiMedia Modeling - 24th International Conference, 2018

Mixed Link Networks.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

2017
Cloud of Line Distribution for Arbitrary Text Detection in Scene/Video/License Plate Images.
Proceedings of the Advances in Multimedia Information Processing - PCM 2017, 2017

Visual Robotic Object Grasping Through Combining RGB-D Data and 3D Meshes.
Proceedings of the MultiMedia Modeling - 23rd International Conference, 2017

A Robust Symmetry-Based Method for Scene/Video Text Detection through Neural Network.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

2016
Biomechanics of high-grade spondylolisthesis with and without reduction.
Medical Biol. Eng. Comput., 2016

2015
Remaining Useful Life Prediction for a Nonlinear Heterogeneous Wiener Process Model With an Adaptive Drift.
IEEE Trans. Reliab., 2015

2013
A new fault detection method for computer networks.
Reliab. Eng. Syst. Saf., 2013

On soft fault diagnosis method based HHT for analog circuits.
Proceedings of the 10th IEEE International Conference on Control and Automation, 2013

A modular design approach for coal-fired power plant control system.
Proceedings of the 10th IEEE International Conference on Control and Automation, 2013

2012
Performance Degradation Monitoring for Onboard Speed Sensors of Trains.
IEEE Trans. Intell. Transp. Syst., 2012

2003
Sufficient conditions for the convergence of open-closed-loop PID-type iterative learning control for nonlinear time-varying systems.
Proceedings of the IEEE International Conference on Systems, 2003

Associative classifier modeling method based on rough set theory and factor analysis technology.
Proceedings of the IEEE International Conference on Systems, 2003

1999
Optimal robust digital control of systems with uncertainty.
Proceedings of the 5th European Control Conference, 1999


  Loading...