HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard.

[BibT_eX]

[DOI]

Yifei Dong

Fengyi Wu

Alexander G. Hauptmann

CoRR, March, 2025

MaxSup: Overcoming Representation Collapse in Label Smoothing.

[BibT_eX]

[DOI]

CoRR, February, 2025

IVAC-$\mathbf {P^{2}L}$: Leveraging Irregular Repetition Priors for Improving Video Action Counting.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

LEAF: Unveiling two sides of the same coin in semi-supervised facial expression recognition.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2025

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

MaxSup: Overcoming Representation Collapse in Label Smoothing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding.

[BibT_eX]

[DOI]

Kimihiro Hasegawa

Wiradee Imrattanatrai

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

A Novel Human Abnormal Posture Detection Method Based on Spatial-Topological Feature Fusion of Skeleton.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling, 2025

DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

SituLM: Leveraging Visual Instruction Tuning and an Augmented SWiG Dataset for Enhanced Grounded Situation Recognition.

[BibT_eX]

[DOI]

Yuran Wang

Zhi-Qi Cheng

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MotionFollower: Editing Video Motion via Score-Guided Diffusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DeformAvatar: Point-Based Human Avatar Re-targeting and Rendering.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Large Language Model Agents in Finance: A Survey Bridging Research, Practice, and Real-World Deployment.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios.

[BibT_eX]

[DOI]

Konstantinos N. Plataniotis

Alexander Hauptmann

Yang You

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

StableAnimator: High-Quality Identity-Preserving Human Image Animation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

A Video-grounded Dialogue Dataset and Metric for Event-driven Activities.

[BibT_eX]

[DOI]

Wiradee Imrattanatrai

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search.

[BibT_eX]

[DOI]

CoRR, 2024

DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing.

[BibT_eX]

[DOI]

CoRR, 2024

FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing.

[BibT_eX]

[DOI]

CoRR, 2024

Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmony.

[BibT_eX]

[DOI]

CoRR, 2024

Prioritize Alignment in Dataset Distillation.

[BibT_eX]

[DOI]

Konstantinos N. Plataniotis

Kai Wang

Yang You

CoRR, 2024

Robust Adaptation of Foundation Models with Black-Box Visual Prompting.

[BibT_eX]

[DOI]

CoRR, 2024

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

CoRR, 2024

LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting.

[BibT_eX]

[DOI]

CoRR, 2024

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception.

[BibT_eX]

[DOI]

CoRR, 2024

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope.

[BibT_eX]

[DOI]

CoRR, 2024

MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models.

[BibT_eX]

[DOI]

Proceedings of the 18th International Workshop on Semantic Evaluation, 2024

Towards Calibrated Robust Fine-Tuning of Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

A Novel Multi-Pose Person Re-Identification Method Based on Semantic- and Pose-Guided Feature Fusion.

[BibT_eX]

[DOI]

Proceedings of the 36th IEEE International Conference on Tools with Artificial Intelligence, 2024

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

SHIELD: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply Chain Disruptions.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Kate Whitefoot

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MotionEditor: Editing Video Motion via Content-Aware Diffusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ProS: Prompting-to-Simulate Generalized Knowledge for Universal Cross-Domain Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Music2P: A Multi-Modal AI-Driven Tool for Simplifying Album Cover Design.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

2023

Tracking with Human-Intent Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Calibrated Robust Fine-Tuning of Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness.

[BibT_eX]

[DOI]

CoRR, 2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

CoRR, 2023

Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules.

[BibT_eX]

[DOI]

Zhi-Qi Cheng

Qi Dai

Alexander G. Hauptmann

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Longshortnet: Exploring Temporal and Semantic Features Fusion In Streaming Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Procontext: Exploring Progressive Context Transformer for Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023, 2023

2022

Hypergraph Transformer for Skeleton-based Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

CrossNet: Boosting Crowd Counting with Localization.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Real-time Semantic Segmentation with Parallel Multiple Views Feature Augmentation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Rethinking Spatial Invariance of Convolutional Networks for Object Counting.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition.

[BibT_eX]

[DOI]

Neurocomputing, 2021

Subspace Representation Learning for Few-shot Image Classification.

[BibT_eX]

[DOI]

Ting-Yao Hu

Zhi-Qi Cheng

Alexander G. Hauptmann

CoRR, 2021

2020

Generating Person Images with Appearance-aware Pose Stylizer.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Stacked Pooling for Boosting Scale Invariance of Crowd Counting.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Learning Spatial Awareness to Improve Crowd Counting.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018

Personalized clothing recommendation combining user social circle and fashion style consistency.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2018

Perceiving Physical Equation by Observing Visual Scenarios.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

CoRR, 2018

Stacked Pooling: Improving Crowd Counting by Boosting Scale Invariance.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

CoRR, 2018

Video2Shop: Exactly Matching Clothes in Videos to Online Shopping Images.

[BibT_eX]

[DOI]

CoRR, 2018

Multi-View Image Generation from a Single-View.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Learning to Transfer: Generalizable Attribute Learning with Multitask Neural Model Search.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Qiang Peng

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017

Video eCommerce++: Toward Large Scale Online Video Advertising.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2017

Multi-View Image Generation from a Single-View.

[BibT_eX]

[DOI]

CoRR, 2017

VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search, and Video hyperlinking.

[BibT_eX]

[DOI]

Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

On the Selection of Anchors and Targets for Video Hyperlinking.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 2017

Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Video eCommerce: Towards Online Video Advertising.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Zhi-Qi Cheng

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...