Jun-Yan He

Orcid: 0000-0002-6628-6924

According to our database1, Jun-Yan He authored at least 53 papers between 2016 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?
CoRR, March, 2026

Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding.
CoRR, January, 2026

ViType: High-Fidelity Visual Text Rendering via Glyph-Aware Multimodal Diffusion.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
LongCat-Image Technical Report.
CoRR, December, 2025

Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation.
CoRR, October, 2025

Exploring Dynamic Transformer for Efficient Object Tracking.
IEEE Trans. Neural Networks Learn. Syst., August, 2025

Person in Uniforms Re-Identification.
ACM Trans. Multim. Comput. Commun. Appl., February, 2025

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

DualEnhance: External Multimodal Foundation Models Guidance and Internal Fast-Slow Teacher Regulation.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search.
CoRR, 2024

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning.
CoRR, 2024

MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis.
CoRR, 2024

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception.
CoRR, 2024

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope.
CoRR, 2024

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

AnyText: Multilingual Visual Text Generation and Editing.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Multi-Modal Instruction Tuned LLMs with Fine-Grained Visual Perception.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Tracking with Human-Intent Reasoning.
CoRR, 2023

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models.
CoRR, 2023

Tracking Anything in High Quality.
CoRR, 2023

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness.
CoRR, 2023

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Longshortnet: Exploring Temporal and Semantic Features Fusion In Streaming Perception.
Proceedings of the IEEE International Conference on Acoustics, 2023

Procontext: Exploring Progressive Context Transformer for Tracking.
Proceedings of the IEEE International Conference on Acoustics, 2023

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023, 2023

Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023


2022
SWNet: A Deep Learning Based Approach for Splashed Water Detection on Road.
IEEE Trans. Intell. Transp. Syst., 2022

Domain-Specific Conditional Jigsaw Adaptation for Enhancing transferability and Discriminability.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

2021
MGSeg: Multiple Granularity-Based Real-Time Semantic Segmentation Network.
IEEE Trans. Image Process., 2021

DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition.
Neurocomputing, 2021

A novel class restriction loss for unsupervised domain adaptation.
Neurocomputing, 2021

2020
Learning fashion compatibility across categories with deep multimodal neural networks.
Neurocomputing, 2020

2019
BranchGAN: Unsupervised Mutual Image-to-Image Transfer With A Single Encoder and Dual Decoders.
IEEE Trans. Multim., 2019

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

2018
Hookworm Detection in Wireless Capsule Endoscopy Images With Deep Learning.
IEEE Trans. Image Process., 2018

2017
Sketch Recognition with Deep Visual-Sequential Fusion Model.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

2016
Detection of bird nests in overhead catenary system images for high-speed rail.
Pattern Recognit., 2016


  Loading...