Yuxin Peng

CoRR, September, 2025

UPP: Unified Point-Level Prompting for Robust Point Cloud Analysis.

[BibT_eX]

[DOI]

CoRR, July, 2025

SphereDrag: Spherical Geometry-Aware Panoramic Image Editing.

[BibT_eX]

[DOI]

CoRR, June, 2025

Evaluating for Evidence of Sociodemographic Bias in Conversational AI for Mental Health Support.

[BibT_eX]

[DOI]

Cyberpsychology Behav. Soc. Netw., 2025

Scan-and-Print: Patch-level Data Summarization and Augmentation for Content-aware Layout Generation in Poster Design.

[BibT_eX]

[DOI]

HsiaoYuan Hsu

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MAI: A Multi-turn Aggregation-Iteration Model for Composed Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation.

[BibT_eX]

[DOI]

HsiaoYuan Hsu

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DKC: Differentiated Knowledge Consolidation for Cloth-Hybrid Lifelong Person Re-identification.

[BibT_eX]

[DOI]

Zhenyu Cui

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Selective Visual Prompting in Vision Mamba.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

DASK: Distribution Rehearsing via Adaptive Style Kernel Learning for Exemplar-Free Lifelong Person Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Compositional Prompting for Anti-Forgetting in Domain Incremental Learning.

[BibT_eX]

[DOI]

Zichen Liu

Int. J. Comput. Vis., December, 2024

Exemplar-Free Lifelong Person Re-identification via Prompt-Guided Adaptive Knowledge Consolidation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., November, 2024

SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection.

[BibT_eX]

[DOI]

CoRR, 2024

An Interference Mitigation Method via Variable Separation Angle between LEO Constellations.

[BibT_eX]

[DOI]

Jiansong Miao

Fuhao Liu

Proceedings of the 16th International Conference on Wireless Communications and Signal Processing, 2024

CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024

Self-supervised Edge Structure Learning for Multi-view Stereo and Parallel Optimization.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 30th International Conference, 2024

FE-VAD: High-Low Frequency Enhanced Weakly Supervised Video Anomaly Detection.

[BibT_eX]

[DOI]

Ruoyan Pi

Jinglin Xu

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

FineSports: A Multi-Person Hierarchical Sports Video Dataset for Fine-Grained Action Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FineParser: A Fine-Grained Spatio-Temporal Action Parser for Human-Centric Action Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models.

[BibT_eX]

[DOI]

Jinglin Xu

Yijie Guo

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FCS: Feature Calibration and Separation for Non-Exemplar Class Incremental Learning.

[BibT_eX]

[DOI]

Qiwei Li

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Continual Vision-Language Retrieval via Dynamic Knowledge Rectification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Smart Public Transportation Sensing: Enhancing Perception and Data Management for Efficient and Safety Operations.

[BibT_eX]

[DOI]

Sensors, November, 2023

Multi Hybrid Extractor Network for 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2023

Uncover the Body: Occluded Person Re-identification via Masked Image Modeling.

[BibT_eX]

[DOI]

Kunlun Xu

Proceedings of the Image and Graphics - 12th International Conference, 2023

2022

Learning conditional photometric stereo with high-resolution features.

[BibT_eX]

[DOI]

Comput. Vis. Media, 2022

Global Contextual Complementary Network for Multi-View Stereo.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Hierarchical Visual-Textual Knowledge Distillation for Life-Long Correlation Learning.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2021

2020

Sequential Cross-Modal Hashing Learning via Multi-scale Correlation Mining.

[BibT_eX]

[DOI]

Zhaoda Ye

ACM Trans. Multim. Comput. Commun. Appl., 2020

Guest Editorial Introduction to the Special Section on Representation Learning for Visual Content Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2020

A Smart User Authentication Approach using Sensing Seat.

[BibT_eX]

[DOI]

Proceedings of the 16th IEEE International Conference on Automation Science and Engineering, 2020

2019

CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2019

2018

Modality-Specific Cross-Modal Similarity Measurement With Recurrent Attention Network.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2018

Cost-Sensitive Deep Metric Learning for Fine-Grained Image Classification.

[BibT_eX]

[DOI]

Junjie Zhao

Proceedings of the MultiMedia Modeling - 24th International Conference, 2018

Recursive Pyramid Network with Joint Attention for Cross-Media Retrieval.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 24th International Conference, 2018

Text-to-image Synthesis via Symmetrical Distillation Networks.

[BibT_eX]

[DOI]

Mingkuan Yuan

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Multi-Scale Correlation for Sequential Cross-modal Hashing Learning.

[BibT_eX]

[DOI]

Zhaoda Ye

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Life-long Cross-media Correlation Learning.

[BibT_eX]

[DOI]

Yunkan Zhuo

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Coarse Label Refined Knowledge Reasoning for Fine-Grained Visual Categorization.

[BibT_eX]

[DOI]

Xiangyu Zhao

Proceedings of the Intelligence Science and Big Data Engineering, 2018

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification.

[BibT_eX]

[DOI]

Chenrui Zhang

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Visual Data Synthesis via GAN for Zero-Shot Video Classification.

[BibT_eX]

[DOI]

Chenrui Zhang

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Cross-media Multi-level Alignment with Relation Attention Network.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Cross-modal Bidirectional Translation via Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Dual Adversarial Networks for Zero-shot Cross-media Retrieval.

[BibT_eX]

[DOI]

Jingze Chi

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Stacking VAE and GAN for Context-aware Text-to-Image Generation.

[BibT_eX]

[DOI]

Chenrui Zhang

Proceedings of the Fourth IEEE International Conference on Multimedia Big Data, 2018

2017

Cross-media analysis and reasoning: advances and directions.

[BibT_eX]

[DOI]

Frontiers Inf. Technol. Electron. Eng., 2017

Discriminative latent semantic feature learning for pedestrian detection.

[BibT_eX]

[DOI]

Chao Zhu

Neurocomputing, 2017

Exploiting distinctive topological constraint of local feature matching for logo image recognition.

[BibT_eX]

[DOI]

Panpan Tang

Neurocomputing, 2017

Cross-media retrieval by exploiting fine-grained correlation at entity level.

[BibT_eX]

[DOI]

Lei Huang

Neurocomputing, 2017

CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2017

Saliency-guided video classification via adaptively weighted learning.

[BibT_eX]

[DOI]

Yunzhen Zhao

Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

2016

Logo Recognition via Improved Topological Constraint.

[BibT_eX]

[DOI]

Panpan Tang

Proceedings of the MultiMedia Modeling - 22nd International Conference, 2016

Cross-Media Retrieval via Semantic Entity Projection.

[BibT_eX]

[DOI]

Lei Huang

Proceedings of the MultiMedia Modeling - 22nd International Conference, 2016

Group Cost-Sensitive Boosting for Multi-Resolution Pedestrian Detection.

[BibT_eX]

[DOI]

Chao Zhu

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

The application of two-level attention models in deep convolutional neural network for fine-grained image classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

A Boosted Multi-Task Model for Pedestrian Detection with Occlusion Handling.

[BibT_eX]

[DOI]

Chao Zhu

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

Adaptive Sampling with Optimal Cost for Class-Imbalance Learning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014

Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2014

Graph-based multimodal semi-supervised image classification.

[BibT_eX]

[DOI]

Neurocomputing, 2014

Weakly-Supervised Image Parsing via Constructing Semantic Graphs and Hypergraphs.

[BibT_eX]

[DOI]

Wenxuan Xie

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Semantic Graph Construction for Weakly-Supervised Image Parsing.

[BibT_eX]

[DOI]

Wenxuan Xie

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

Cross-View Feature Learning for Scalable Social Image Analysis.

[BibT_eX]

[DOI]

Wenxuan Xie

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013

Exploiting Semantic and Visual Context for Effective Video Annotation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2013

Latent semantic learning with structured sparse representation for human action recognition.

[BibT_eX]

[DOI]

Pattern Recognit., 2013

Cross-media retrieval by intra-media and inter-media correlation mining.

[BibT_eX]

[DOI]

Multim. Syst., 2013

L<sub>1</sub>-graph construction using structured sparsity.

[BibT_eX]

[DOI]

Guangyao Zhou

Neurocomputing, 2013

Vocabulary hierarchy optimisation based on spatial context and category information.

[BibT_eX]

[DOI]

Int. J. Multim. Intell. Secur., 2013

Exhaustive and Efficient Constraint Propagation: A Graph-Based Learning Approach and Its Applications.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2013

A temporal context model for boosting video annotation.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2013

Learning Descriptive Visual Representation by Semantic Regularized Matrix Factorization.

[BibT_eX]

[DOI]

Proceedings of the IJCAI 2013, 2013

Multimodal semi-supervised image classification by combining tag refinement, graph-based learning and support vector regression.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2013

Cross-media retrieval by cluster-based correlation analysis.

[BibT_eX]

[DOI]

Ding Ma

Proceedings of the IEEE International Conference on Image Processing, 2013

Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

Unified Constraint Propagation on Multi-View Data.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012

Effective Heterogeneous Similarity Measure with Nearest Neighbors for Cross-Media Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Modeling - 18th International Conference, 2012

Visual Vocabulary Optimization with Spatial Context for Image Annotation and Classification.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Modeling - 18th International Conference, 2012

PDSS: patch-descriptor-similarity space for effective face verification.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Image annotation by semantic sparse recoding of visual content.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval.

[BibT_eX]

[DOI]

Li Ling

Proceedings of the 21st International Conference on Pattern Recognition, 2012

Heterogeneous Constraint Propagation with Constrained Sparse Representation.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on Data Mining, 2012

Cross-modality correlation propagation for cross-media retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Contextual Kernel and Spectral Methods for Learning the Semantics of Images.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2011

Combining multiple clusterings using fast simulated annealing.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2011

Robust Image Analysis by L1-Norm Semi-supervised Learning

[BibT_eX]

[DOI]

CoRR, 2011

Exhaustive and Efficient Constraint Propagation: A Semi-Supervised Learning Perspective and Its Applications

[BibT_eX]

[DOI]

CoRR, 2011

Mining concept relationship in temporal context for effective video annotation.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Combining latent semantic learning and reduced hypergraph learning for semi-supervised image categorization.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Web video search by mutual boosting between the inside and outside text of video.

[BibT_eX]

[DOI]

Proceedings of the 2011 Joint International Conference on Digital Libraries, 2011

Spectral learning of latent semantics for action recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2011

Latent Semantic Learning by Efficient Sparse Coding with Hypergraph Regularization.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

Symmetric Graph Regularized Constraint Propagation.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2010

Gaussian mixture learning via robust competitive agglomeration.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2010

Image categorization via robust pLSA.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2010

Story-Based Retrieval by Learning and Measuring the Concept-Based and Content-Based Similarity.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Modeling, 2010

Refining video annotation by exploiting inter-shot context.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Multimedia 2010, 2010

AdaOUBoost: adaptive over-sampling and under-sampling to boost the concept learning in large scale imbalanced data sets.

[BibT_eX]

[DOI]

Jia Yao

Proceedings of the 11th ACM SIGMM International Conference on Multimedia Information Retrieval, 2010

Effective Multi-level Image Representation for Image Categorization.

[BibT_eX]

[DOI]

Hao Li

Proceedings of the 20th International Conference on Pattern Recognition, 2010

2009

PKU-ICST at TRECVID2009: High Level Feature Extraction and Search.

[BibT_eX]

[DOI]

Proceedings of the TRECVID 2009 workshop participants notebook papers, 2009

Semantic concept annotation based on audio PLSA model.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Multimedia 2009, 2009

Audio retrieval by segment-based manifold-ranking.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Using Multiple Frame Integration for the Text Recognition of Video.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Document Analysis and Recognition, 2009

2008

A Semi-supervised Learning Algorithm on Gaussian Mixture with Automatic Model Selection.

[BibT_eX]

[DOI]

Neural Process. Lett., 2008

Peking University at TRECVID 2008: High Level Feature Extraction.

[BibT_eX]

[DOI]

Proceedings of the TRECVID 2008 workshop participants notebook papers, 2008

Unsupervised learning of finite mixtures using entropy regularization and its application to image segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

From Comparing Clusterings to Combining Clusterings.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 2008

2007

OM-based video shot retrieval by one-to-one matching.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2007

Color-Based Text Extraction for the Image.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing, 2007

Color-based clustering for text detection and extraction in image.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Multimedia 2007, 2007

A Theoretical Approach to Construct Highly Discriminative Features with Application in AdaBoost.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2007

2006

Clip-based similarity measure for query-dependent clip retrieval and video summarization.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2006

Using Earth Mover's Distance for Audio Clip Retrieval.

[BibT_eX]

[DOI]

Cuihua Fang

Xiaoou Chen

Proceedings of the Advances in Multimedia Information Processing, 2006

Audio similarity measure by graph modeling and matching.

[BibT_eX]

[DOI]

Proceedings of the 14th ACM International Conference on Multimedia, 2006

2005

A New Retrieval Model Based on TextTiling for Document Similarity Search.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2005

A New Re-ranking Method for Generic Chinese Text Summarization and Its Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Digital Libraries: Implementing Strategies and Sharing Experiences, 2005

Hot Event Detection and Summarization by Graph Modeling and Matching.

[BibT_eX]

[DOI]

Proceedings of the Image and Video Retrieval, 4th International Conference, 2005

EMD-Based Video Clip Retrieval by Many-to-Many Matching.

[BibT_eX]

[DOI]

Proceedings of the Image and Video Retrieval, 4th International Conference, 2005

The earth mover's distance as a semantic measure for document similarity.

[BibT_eX]

[DOI]

Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31, 2005

2004

Clip-based similarity measure for hierarchical video retrieval.

[BibT_eX]

[DOI]

Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2004

A Measure Based on Optimal Matching in Graph Theory for Document Similarity.

[BibT_eX]

[DOI]