Jun-Yan He

Orcid: 0000-0002-6628-6924

According to our database¹, Jun-Yan He authored at least 53 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

[BibT_eX]

[DOI]

CoRR, March, 2026

Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding.

[BibT_eX]

[DOI]

CoRR, January, 2026

ViType: High-Fidelity Visual Text Rendering via Glyph-Aware Multimodal Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

LongCat-Image Technical Report.

[BibT_eX]

[DOI]

CoRR, December, 2025

Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

CoRR, October, 2025

Exploring Dynamic Transformer for Efficient Object Tracking.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., August, 2025

Person in Uniforms Re-Identification.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2025

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

DualEnhance: External Multimodal Foundation Models Guidance and Internal Fast-Slow Teacher Regulation.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search.

[BibT_eX]

[DOI]

CoRR, 2024

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

CoRR, 2024

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception.

[BibT_eX]

[DOI]

CoRR, 2024

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope.

[BibT_eX]

[DOI]

CoRR, 2024

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

AnyText: Multilingual Visual Text Generation and Editing.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Multi-Modal Instruction Tuned LLMs with Fine-Grained Visual Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Tracking with Human-Intent Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Tracking Anything in High Quality.

[BibT_eX]

[DOI]

CoRR, 2023

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness.

[BibT_eX]

[DOI]

CoRR, 2023

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.

[BibT_eX]

[DOI]

Kannappan Palaniappan

Norbert Scherer-Negenborn

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Longshortnet: Exploring Temporal and Semantic Features Fusion In Streaming Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Procontext: Exploring Progressive Context Transformer for Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023, 2023

Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NTIRE 2023 Video Colorization Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

SWNet: A Deep Learning Based Approach for Splashed Water Detection on Road.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Transp. Syst., 2022

Domain-Specific Conditional Jigsaw Adaptation for Enhancing transferability and Discriminability.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

2021

MGSeg: Multiple Granularity-Based Real-Time Semantic Segmentation Network.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition.

[BibT_eX]

[DOI]

Neurocomputing, 2021

A novel class restriction loss for unsupervised domain adaptation.

[BibT_eX]

[DOI]

Neurocomputing, 2021

2020

Learning fashion compatibility across categories with deep multimodal neural networks.

[BibT_eX]

[DOI]

Neurocomputing, 2020

2019

BranchGAN: Unsupervised Mutual Image-to-Image Transfer With A Single Encoder and Dual Decoders.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2019

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the 27th ACM International Conference on Multimedia, 2019

2018

Hookworm Detection in Wireless Capsule Endoscopy Images With Deep Learning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2018

2017

Sketch Recognition with Deep Visual-Sequential Fusion Model.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

2016

Detection of bird nests in overhead catenary system images for high-speed rail.

[BibT_eX]

[DOI]

Pattern Recognit., 2016

Jun-Yan He

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...