Yongkang Wong

ACM Trans. Multim. Comput. Commun. Appl., July, 2025

Learning to Predict Gradients for Semi-Supervised Continual Learning.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., February, 2025

2024

Recurrent Appearance Flow for Occlusion-Free Virtual Try-On.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., August, 2024

PAINT: Photo-realistic Fashion Design Synthesis.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2024

Unsupervised Domain Adaptation by Causal Learning for Biometric Signal-based HCI.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2024

Multi2Human: Controllable human image generation with multimodal controls.

[BibT_eX]

[DOI]

Neurocomputing, 2024

Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM.

[BibT_eX]

[DOI]

CoRR, 2024

STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting.

[BibT_eX]

[DOI]

CoRR, 2024

TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

Bridging the Intent Gap: Knowledge-Enhanced Visual Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Privacy-Enhancing Person Re-identification Framework - A Dual-Stage Approach.

[BibT_eX]

[DOI]

Kajal Kansal

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MCM: Multi-condition Motion Synthesis Framework.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Finetuning Text-to-Image Diffusion Models for Fairness.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Semantic-Aware Triplet Loss for Image Classification.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Learning to Minimize the Remainder in Supervised Learning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Fair Representation: Guaranteeing Approximate Multiple Group Fairness for Unknown Tasks.

[BibT_eX]

[DOI]

Xudong Shen

IEEE Trans. Pattern Anal. Mach. Intell., 2023

ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens.

[BibT_eX]

[DOI]

CoRR, 2023

MCM: Multi-condition Motion Synthesis Framework for Multi-scenario.

[BibT_eX]

[DOI]

CoRR, 2023

A Study on Differentiable Logic and LLMs for EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2023.

[BibT_eX]

[DOI]

CoRR, 2023

Narrative Graph for Narrative Generation from Long Videos.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on User-centric Narrative Summarization of Long Videos, 2023

NarSUM '23: The 2nd Workshop on User-Centric Narrative Summarization of Long Videos.

[BibT_eX]

[DOI]

Ioannis (Yiannis) Patras

Proceedings of the 31st ACM International Conference on Multimedia, 2023

2022

Enhanced 3D Shape Reconstruction With Knowledge Graph of Category Concept.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2022

Relation-Aware Compositional Zero-Shot Learning for Attribute-Object Pair Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Panel Discussion: Emerging Topics on Video Summarization.

[BibT_eX]

[DOI]

Proceedings of the NarSUM '22: Proceedings of the 1st Workshop on User-centric Narrative Summarization of Long Videos, 2022

Compute to Tell the Tale: Goal-Driven Narrative Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Distance Matters in Human-Object Interaction Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

NarSUM '22: 1st Workshop on User-centric Narrative Summarization of Long Videos.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Chairs Can Be Stood On: Overcoming Object Bias in Human-Object Interaction Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2022

2021

DeepDance: Music-to-Dance Motion Choreography With Adversarial Learning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2021

Toward Multi-Modal Conditioned Fashion Image Translation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2021

Scene Graph Inference via Multi-Scale Context Modeling.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2021

Direction Concentration Learning: Enhancing Congruency in Machine Learning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2021

Unsupervised Motion Representation Learning with Capsule Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning to Predict Trustworthiness with Steep Slope Loss.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning Causal Representation for Training Cross-Domain Pose Estimator via Generative Interventions.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

G-Softmax: Improving Intraclass Compactness and Interclass Separability of Features.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., 2020

Interact as You Intend: Intention-Driven Human-Object Interaction Detection.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

Video Storytelling: Textual Summaries for Events.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

Unsupervised Online Video Object Segmentation With Motion Property Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Visual Social Relationship Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2020

GradMix: Multi-source Transfer across Domains and Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Weakly-Supervised Multi-Person Action Recognition in 360° Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

n-Reference Transfer Learning for Saliency Prediction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

A Multi-sensor Framework for Personal Presentation Analytics.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2019

Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval and Analysis.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2019

Dual-Stream Recurrent Neural Network for Video Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2019

Surface-Electromyography-Based Gesture Recognition by Multi-View Deep Learning.

[BibT_eX]

[DOI]

IEEE Trans. Biomed. Eng., 2019

A multi-stream convolutional neural network for sEMG-based gesture recognition in muscle-computer interface.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2019

LSTM-based multi-label video event detection.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2019

G-softmax: Improving Intra-class Compactness and Inter-class Separability of Features.

[BibT_eX]

[DOI]

CoRR, 2019

sEMG-Based Gesture Recognition With Embedded Virtual Hand Poses and Adversarial Learning.

[BibT_eX]

[DOI]

IEEE Access, 2019

Explainable Video Action Reasoning via Prior Knowledge and State Transitions.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Unsupervised Domain Adaptation for 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Human-imperceptible Privacy Protection Against Machines.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Self-supervised Representation Learning Using 360° Data.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Learning to Detect Human-Object Interactions With Knowledge.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Learning to Learn From Noisy Labeled Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Learning Controllable Face Generator from Disjoint Datasets.

[BibT_eX]

[DOI]

Jing Li

Terence Sim

Proceedings of the Computer Analysis of Images and Patterns, 2019

2018

Video Storytelling.

[BibT_eX]

[DOI]

CoRR, 2018

A Fine-Grained Spatial-Temporal Attention Model for Video Captioning.

[BibT_eX]

[DOI]

IEEE Access, 2018

Unsupervised Learning of View-invariant Action Representations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Cybern., 2017

Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2017

Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision, 2017

Tianjin University and National University of Singapore at TRECVID 2017: Video to Text Description.

[BibT_eX]

[DOI]

Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Attention Transfer from Web Images for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

Understanding Fashion Trends from Street Photos via Neighbor-Constrained Embedding Learning.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

Semi-Supervised Learning for Surface EMG-based Gesture Recognition.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Dual-Glance Model for Deciphering Social Relationships.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Multi-Camera Action Dataset (MCAD): A Dataset for Studying Non-overlapped Cross-Camera Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2016

Demo Paper: PreSense - An Assistive Presentation Self-Quantification System.

[BibT_eX]

[DOI]

Junnan Li

Proceedings of the IEEE International Symposium on Multimedia, 2016

Multi-stream Deep Learning Framework for Automated Presentation Assessment.

[BibT_eX]

[DOI]

Junnan Li

Proceedings of the IEEE International Symposium on Multimedia, 2016

Towards protecting biometric templates without sacrificing performance.

[BibT_eX]

[DOI]

Jing Li

Terence Sim

Proceedings of the 23rd International Conference on Pattern Recognition, 2016

Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

2015

Multi-Camera Saliency.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2015

Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Multi-sensor Self-Quantification of Presentations.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Label Consistent Quadratic Surrogate model for visual saliency prediction.

[BibT_eX]

[DOI]

Yan Luo

Qi Zhao

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014

Automatic classification of Human Epithelial type 2 cell Indirect Immunofluorescence images using Cell Pyramid Matching.

[BibT_eX]

[DOI]

Pattern Recognit., 2014

On robust face recognition via sparse coding: the good, the bad and the ugly.

[BibT_eX]

[DOI]

IET Biom., 2014

Multi-view action recognition by cross-domain learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE 16th International Workshop on Multimedia Signal Processing, 2014

View-invariant feature discovering for multi-camera human action recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE 16th International Workshop on Multimedia Signal Processing, 2014

Recovering Social Interaction Spatial Structure from Multiple First-Person Views.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Workshop on Socially-Aware Multimedia, 2014

Scalable Decision-Theoretic Coordination and Control for Real-time Active Multi-Camera Surveillance.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Distributed Smart Cameras, 2014

Discovering Person Identity via Large-Scale Observations.

[BibT_eX]

[DOI]

Lekha Chaisorn

Proceedings of the Computer Vision - ACCV 2014 Workshops, 2014

2013

On Robust Face Recognition via Sparse Encoding: the Good, the Bad, and the Ugly

[BibT_eX]

[DOI]

CoRR, 2013

Classification of Human Epithelial type 2 cell indirect immunofluoresence images via codebook based descriptors.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Applications of Computer Vision, 2013

Temporal encoded F-formation system for social interaction detection.

[BibT_eX]

[DOI]

Proceedings of the ACM Multimedia Conference, 2013

Video analytics for surveillance camera networks.

[BibT_eX]

[DOI]

Lekha Chaisorn

Proceedings of the 19th IEEE International Conference on Networks, 2013

2012

Towards robust identity inference under surveillance environments: from still images to video sequences

[BibT_eX]

[DOI]

PhD thesis, 2012

On robust biometric identity verification via sparse encoding of faces: Holistic vs local approaches.

[BibT_eX]

[DOI]

Brian C. Lovell

Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN), 2012

Combined Learning of Salient Local Descriptors and Distance Metrics for Image Set Face Verification.

[BibT_eX]

[DOI]

Brian C. Lovell

Proceedings of the Ninth IEEE International Conference on Advanced Video and Signal-Based Surveillance, 2012

2011

Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011

2010

Dynamic Amelioration of Resolution Mismatches for Local Feature Based Identity Inference.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Pattern Recognition, 2010

2009

Regression Based Non-frontal Face Synthesis for Improved Identity Verification.

[BibT_eX]

[DOI]