Jianhua Han

Orcid: 0009-0004-1559-657X

According to our database¹, Jianhua Han authored at least 83 papers between 2008 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

AtomThink: Multimodal Slow Thinking With Atomic Step Reasoning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2026

MagicSeg: Open-World Segmentation Pretraining via Counterfactural Diffusion-Based Auto-Generation.

[BibT_eX]

[DOI]

CoRR, March, 2026

Towards Unified Multimodal Interleaved Generation via Group Relative Policy Optimization.

[BibT_eX]

[DOI]

CoRR, March, 2026

AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots.

[BibT_eX]

[DOI]

CoRR, March, 2026

CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, March, 2026

RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training.

[BibT_eX]

[DOI]

CoRR, February, 2026

Thinking with Geometry: Active Geometry Integration for Spatial Reasoning.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, November, 2025

RealignDiff: Boosting Text-to-Image Diffusion Model With Coarse-to-Fine Semantic Realignment.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., October, 2025

Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI.

[BibT_eX]

[DOI]

CoRR, October, 2025

HiLM-D: Enhancing MLLMs with Multi-scale High-Resolution Details for Autonomous Driving.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., August, 2025

NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning.

[BibT_eX]

[DOI]

CoRR, July, 2025

UniAdapter: All-in-One Control for Flexible Video Generation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., June, 2025

Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs.

[BibT_eX]

[DOI]

CoRR, June, 2025

EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation.

[BibT_eX]

[DOI]

CoRR, June, 2025

PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning.

[BibT_eX]

[DOI]

CoRR, April, 2025

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement.

[BibT_eX]

[DOI]

CoRR, April, 2025

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

[BibT_eX]

[DOI]

CoRR, March, 2025

TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba.

[BibT_eX]

[DOI]

CoRR, February, 2025

DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

SeePhys: Does Seeing Help Thinking? - Benchmarking Vision-Based Physics Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Fine-Grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., November, 2024

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions.

[BibT_eX]

[DOI]

CoRR, 2024

EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation.

[BibT_eX]

[DOI]

CoRR, 2024

From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs.

[BibT_eX]

[DOI]

CoRR, 2024

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models.

[BibT_eX]

[DOI]

CoRR, 2024

VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

UNIT: Unifying Image and Text Recognition in One Vision Encoder.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Ins-DetCLIP: Aligning Detection Model to Follow Human-Language Instruction.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Reason2Drive: Towards Interpretable and Chain-Based Reasoning for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Implicit Concept Removal of Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

LayerDiff: Exploring Text-Guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-Fine Pose-Reversible Guidance.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

DetCLIPv3: Towards Versatile Generative Open-Vocabulary Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Holistic Autonomous Driving Understanding by Bird'View Injected Multi-Modal Large Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Geom-Erasing: Geometry-Driven Removal of Implicit Concept in Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

MO-VLN: A Multi-Task Benchmark for Open-set Zero-Shot Vision-and-Language Navigation.

[BibT_eX]

[DOI]

CoRR, 2023

Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Universal Vision-language Omni-supervised Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DetGPT: Detect What You Need via Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CLIP<sup>2</sup>: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NLIP: Noise-Robust Language-Image Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

P<sup>3</sup>OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Generative Negative Text Replay for Continual Vision-Language Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

ONCE-3DLanes: Building Monocular 3D Lane Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Laneformer: Object-Aware Row-Column Transformers for Lane Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2021

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

2019

Order-aware Embedding Neural Network for CTR Prediction.

[BibT_eX]

[DOI]

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Optimizing Ranking Algorithm in Recommender System via Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Advanced Manufacturing, 2019

2017

Aggregating Crowd Wisdoms with Label-aware Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

2016

Label Aggregation with Instance Grouping Model.

[BibT_eX]

[DOI]

Li'ang Yin

Jianhua Han

Yong Yu

Proceedings of the 25th International Conference on World Wide Web, 2016

Aggregating Crowd Wisdom with Instance Grouping Methods.

[BibT_eX]

[DOI]

Proceedings of the Web Technologies and Applications - 18th Asia-Pacific Web Conference, 2016

2014

Fault-Tolerant Control, Fault Diagnosis and Recovery in Runtime of Business Docking Service Composition Flow in the Cloud Environment.

[BibT_eX]

[DOI]

Proceedings of the Intelligent Computing Methodologies - 10th International Conference, 2014

2013

A study on the scalable flow model of web services choreography and orchestration based on dynamic workflow.

[BibT_eX]

[DOI]

Jianhua Han

Yuan Luo

Jianping Huang

Int. J. Inf. Commun. Technol., 2013

2011

Construction and Application of the Merging Network Teaching Platform.

[BibT_eX]

[DOI]

Jianhua Han

Shanshan Du

Jikui Wen

Proceedings of the Frontiers in Computer Education [International Conference on Frontiers in Computer Education, 2011

2010

A study on academic performance and interpersonal interactions based on network.

[BibT_eX]

[DOI]

Jianhua Han

Jikui Wen

Xiaojun Zhao

Proceedings of the 2010 14th International Conference on Computer Supported Cooperative Work in Design, 2010

A New Network Collaborative Manufacturing Based on the STEP-NC.

[BibT_eX]

[DOI]

Jianhua Han

Juan Du

Xianguo Yan

Proceedings of the International Conference on Computational Aspects of Social Networks, 2010

2008

Cooperative Petition Processing Model Based on Dynamic Workflow Management.

[BibT_eX]

[DOI]

Jianhua Han

Weihua Li

Proceedings of the Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008

Jianhua Han

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...