Yuxin Peng

Orcid: 0000-0001-7658-3845

Affiliations:
  • Peking University, Beijing, China


According to our database1, Yuxin Peng authored at least 135 papers between 2012 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Long Short-Term Knowledge Decomposition and Consolidation for Lifelong Person Re-Identification.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

Human-Centric Fine-Grained Action Quality Assessment.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2025

Interact-Custom: Customized Human Object Interaction Image Generation.
CoRR, August, 2025

Investigating Domain Gaps for Indoor 3D Object Detection.
CoRR, August, 2025

Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset.
CoRR, August, 2025

RaT2IGen: Relation-aware Text-to-image Generation via Learnable Prompt.
ACM Trans. Multim. Comput. Commun. Appl., May, 2025

Learning Comprehensive Visual Grounding for Video Captioning.
IEEE Trans. Circuits Syst. Video Technol., April, 2025

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation.
CoRR, April, 2025

Learning Guided Implicit Depth Function With Scale-Aware Feature Fusion.
IEEE Trans. Image Process., 2025

STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
EOGT: Video Anomaly Detection with Enhanced Object Information and Global Temporal Dependency.
ACM Trans. Multim. Comput. Commun. Appl., October, 2024

SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text Feedback.
ACM Trans. Multim. Comput. Commun. Appl., June, 2024

I2C: Invertible Continuous Codec for High-Fidelity Variable-Rate Image Compression.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2024

Negatives Make a Positive: An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Decoupled domain-specific and domain-conditional representation learning for cross-domain recommendation.
Inf. Process. Manag., March, 2024

MAAN: Memory-Augmented Auto-Regressive Network for Text-Driven 3D Indoor Scene Generation.
IEEE Trans. Multim., 2024

HCL: Hierarchical Consistency Learning for Webly Supervised Fine-Grained Recognition.
IEEE Trans. Multim., 2024

Image Super-Resolution via Efficient Transformer Embedding Frequency Decomposition With Restart.
IEEE Trans. Image Process., 2024

Toward Video Anomaly Retrieval From Video Anomaly Detection: New Benchmarks and Model.
IEEE Trans. Image Process., 2024

MECOM: A Meta-Completion Network for Fine-Grained Recognition With Incomplete Multi-Modalities.
IEEE Trans. Image Process., 2024

SIM-OFE: Structure Information Mining and Object-Aware Feature Enhancement for Fine-Grained Visual Categorization.
IEEE Trans. Image Process., 2024

DMA: Dual Modality-Aware Alignment for Visible-Infrared Person Re-Identification.
IEEE Trans. Inf. Forensics Secur., 2024

ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

RelScene: A Benchmark and baseline for Spatial Relations in text-driven 3D Scene Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Mitigate Catastrophic Remembering via Continual Knowledge Purification for Noisy Lifelong Person Re-Identification.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

InsVP: Efficient Instance Visual Prompting from Image Itself.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Progressive Prototype Evolving for Dual-Forgetting Mitigation in Non-Exemplar Online Continual Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

FineFMPL: Fine-grained Feature Mining Prompt Learning for Few-Shot Class Incremental Learning.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Semantic-Aware Human Object Interaction Image Generation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Firzen: Firing Strict Cold-Start Items with Frozen Heterogeneous and Homogeneous Graphs for Recommendation.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Training-Free Video Temporal Grounding Using Large-Scale Pre-trained Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Exploring Conditional Multi-modal Prompts for Zero-Shot HOI Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

Distribution-Aware Knowledge Prototyping for Non-Exemplar Lifelong Person Re-Identification.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DART: Dual-Modal Adaptive Online Prompting and Knowledge Retention for Test-Time Adaptation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Comprehensive Visual Grounding for Video Description.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation.
ACM Trans. Multim. Comput. Commun. Appl., November, 2023

Attribute-Aware Deep Hashing With Self-Consistency for Large-Scale Fine-Grained Image Retrieval.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Disentangled Graph Neural Networks for Session-Based Recommendation.
IEEE Trans. Knowl. Data Eng., August, 2023

DCR-ReID: Deep Component Reconstruction for Cloth-Changing Person Re-Identification.
IEEE Trans. Circuits Syst. Video Technol., August, 2023

CAT: a coarse-to-fine attention tree for semantic change detection.
Vis. Intell., 2023

MKVSE: Multimodal Knowledge Enhanced Visual-semantic Embedding for Image-text Retrieval.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model.
CoRR, 2023

MB-HGCN: A Hierarchical Graph Convolutional Network for Multi-behavior Recommendation.
CoRR, 2023

Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation.
CoRR, 2023

Multi-Behavior Recommendation with Cascading Graph Convolution Networks.
Proceedings of the ACM Web Conference 2023, 2023

MV-Diffusion: Motion-aware Video Diffusion Model.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Efficiency-optimized Video Diffusion Models.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Real20M: A Large-scale E-commerce Dataset for Cross-domain Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

DensityLayout: Density-Conditioned Layout GAN for Visual-Textual Presentation Designs.
Proceedings of the Image and Graphics - 12th International Conference, 2023

Masked Retraining Teacher-Student Framework for Domain Adaptive Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PosterLayout: A New Benchmark and Approach for Content-Aware Visual-Textual Presentation Layout.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Phrase-Level Temporal Relationship Mining for Temporal Sentence Localization.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Dual-View 3D Reconstruction via Learning Correspondence and Dependency of Point Cloud Regions.
IEEE Trans. Image Process., 2022

Unsupervised Visual-Textual Correlation Learning With Fine-Grained Semantic Alignment.
IEEE Trans. Cybern., 2022

MARS: Learning Modality-Agnostic Representation for Scalable Cross-Media Retrieval.
IEEE Trans. Circuits Syst. Video Technol., 2022

Fine-Grained Image Analysis With Deep Learning: A Survey.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Semantic association enhancement transformer with relative position for image captioning.
Multim. Tools Appl., 2022

Team PKU-WICT-MIPL PIC Makeup Temporal Video Grounding Challenge 2022 Technical Report.
CoRR, 2022

Prototype-based classifier learning for long-tailed visual recognition.
Sci. China Inf. Sci., 2022

Learn from Unlabeled Videos for Near-duplicate Video Retrieval.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Weakly Supervised Video Anomaly Detection with Temporal and Abnormal Information.
Proceedings of the Pattern Recognition and Computer Vision - 5th Chinese Conference, 2022

An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Visual-Textual Hybrid Sequence Matching for Joint Reasoning.
IEEE Trans. Cybern., 2021

Multi-Level Knowledge Injecting for Visual Commonsense Reasoning.
IEEE Trans. Circuits Syst. Video Technol., 2021

2020
RCE-HIL: Recognizing Cross-media Entailment with Heterogeneous Interactive Learning.
ACM Trans. Multim. Comput. Commun. Appl., 2020

Multi-Pathway Generative Adversarial Hashing for Unsupervised Cross-Modal Retrieval.
IEEE Trans. Multim., 2020

CKD: Cross-Task Knowledge Distillation for Text-to-Image Synthesis.
IEEE Trans. Multim., 2020

Deep Reinforcement Learning for Image Hashing.
IEEE Trans. Multim., 2020

Video Captioning With Object-Aware Spatio-Temporal Correlation and Aggregation.
IEEE Trans. Image Process., 2020

MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism.
IEEE Trans. Image Process., 2020

SCH-GAN: Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network.
IEEE Trans. Cybern., 2020

MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval.
IEEE Trans. Cybern., 2020

Bridge-GAN: Interpretable Representation Learning for Text-to-Image Synthesis.
IEEE Trans. Circuits Syst. Video Technol., 2020

Quintuple-Media Joint Correlation Learning With Deep Compression and Regularization.
IEEE Trans. Circuits Syst. Video Technol., 2020

Reinforced Cross-Media Correlation Learning by Context-Aware Bidirectional Translation.
IEEE Trans. Circuits Syst. Video Technol., 2020

Unsupervised Cross-Media Retrieval Using Domain Adaptation With Scene Graph.
IEEE Trans. Circuits Syst. Video Technol., 2020

Fine-Grained Visual-Textual Representation Learning.
IEEE Trans. Circuits Syst. Video Technol., 2020

Zero-Shot Cross-Media Embedding Learning With Dual Adversarial Distribution Network.
IEEE Trans. Circuits Syst. Video Technol., 2020

DV-Net: Dual-view network for 3D reconstruction by fusing multiple sets of gated control point clouds.
Pattern Recognit. Lett., 2020

Attribute hierarchy based multi-task learning for fine-grained image classification.
Neurocomputing, 2020

PKU_WICT at TRECVID 2020: Instance Search Task.
Proceedings of the 2020 TREC Video Retrieval Evaluation, 2020

2019
Show and Tell in the Loop: Cross-Modal Circular Correlation Learning.
IEEE Trans. Multim., 2019

TPCKT: Two-Level Progressive Cross-Media Knowledge Transfer.
IEEE Trans. Multim., 2019

SSDH: Semi-Supervised Deep Hashing for Large Scale Image Retrieval.
IEEE Trans. Circuits Syst. Video Technol., 2019

Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification.
IEEE Trans. Circuits Syst. Video Technol., 2019

Fast Fine-Grained Image Classification via Weakly Supervised Discriminative Localization.
IEEE Trans. Circuits Syst. Video Technol., 2019

Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization.
Int. J. Comput. Vis., 2019

PKU_ICST at TRECVID 2019: Instance Search Task.
Proceedings of the 2019 TREC Video Retrieval Evaluation, 2019

Hierarchical Vision-Language Alignment for Video Captioning.
Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

A New Benchmark and Approach for Fine-grained Cross-media Retrieval.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

IRC-GAN: Introspective Recurrent Convolutional GAN for Text-to-video Generation.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Object-Aware Aggregation With Bidirectional Temporal Graph for Video Captioning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Query-Adaptive Image Retrieval by Deep-Weighted Hashing.
IEEE Trans. Multim., 2018

CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network.
IEEE Trans. Multim., 2018

Object-Part Attention Model for Fine-Grained Image Classification.
IEEE Trans. Image Process., 2018

An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges.
IEEE Trans. Circuits Syst. Video Technol., 2018

PKU_ICST at TRECVID 2018: Instance Search Task.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Multi-attention Guided Activation Propagation in CNNs.
Proceedings of the Pattern Recognition and Computer Vision - First Chinese Conference, 2018

Only Learn One Sample: Fine-Grained Visual Categorization with One Sample Training.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Progressive Cross-Media Correlation Learning.
Proceedings of the Image and Graphics Technologies and Applications, 2018

Deep Cross-Media Knowledge Transfer.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Unsupervised Generative Adversarial Cross-Modal Hashing.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Cross-media similarity metric learning with unified deep networks.
Multim. Tools Appl., 2017

Visual-textual Attention Driven Fine-grained Representation Learning.
CoRR, 2017

CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network.
CoRR, 2017

Object-Part Attention Driven Discriminative Localization for Fine-grained Image Classification.
CoRR, 2017

Fine-graind Image Classification via Combining Vision and Language.
CoRR, 2017

PKU_ICST at TRECVID 2017: Instance Search Task.
Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Cross-modal Common Representation Learning by Hybrid Transfer Network.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Cross-modal deep metric learning with multi-task regularization.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

Zero-Shot Cross-Media Retrieval with External Knowledge.
Proceedings of the Internet Multimedia Computing and Service, 2017

Attention-Sharing Correlation Learning for Cross-Media Retrieval.
Proceedings of the Image and Graphics - 9th International Conference, 2017

Fine-Grained Image Classification via Combining Vision and Language.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Weakly Supervised Learning of Part Selection Model with Spatial Constraints for Fine-Grained Image Classification.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization.
IEEE Trans. Circuits Syst. Video Technol., 2016

Query-adaptive Image Retrieval by Deep Weighted Hashing.
CoRR, 2016

SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval.
CoRR, 2016

PKU-ICST at TRECVID 2016: Instance Search Task.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Cross-Media Retrieval by Multimodal Representation Fusion with Deep Networks.
Proceedings of the Digital TV and Wireless Multimedia Communication, 2016

2015
PKU-ICST at TRECVID 2015: Instance Search Task.
Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

2014
PKU-ICST at TRECVID 2014: Instance Search Task.
Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

2013
PKU_ICST at TRECVID2013 : Instance Search Task.
Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

2012
PKU-ICST @TRECVID2012: Known-item Search Task.
Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012


  Loading...