Jun Xiao

Orcid: 0000-0002-6142-9914

Affiliations:
  • Zhejiang University, College of Computer Science and Technology, Hangzhou, China (PhD 2007)


According to our database1, Jun Xiao authored at least 170 papers between 2005 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation.
IEEE Trans. Circuits Syst. Video Technol., January, 2024

Taking a Closer Look At Visual Relation: Unbiased Video Scene Graph Generation With Decoupled Label Learning.
IEEE Trans. Multim., 2024

Explore Synergistic Interaction Across Frames for Interactive Video Object Segmentation.
CoRR, 2024

Existence Is Chaos: Enhancing 3D Human Motion Prediction with Uncertainty Consideration.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Dual-Path Rare Content Enhancement Network for Image and Text Matching.
IEEE Trans. Circuits Syst. Video Technol., October, 2023

Federated unsupervised representation learning.
Frontiers Inf. Technol. Electron. Eng., August, 2023

VL-NMS: Breaking Proposal Bottlenecks in Two-stage Visual-language Matching.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Unsupervised self-training correction learning for 2D image-based 3D model retrieval.
Inf. Process. Manag., 2023

Differentiated matching for individual and average treatment effect estimation.
Data Min. Knowl. Discov., 2023

Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation.
CoRR, 2023

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards.
CoRR, 2023

Mitigating Biased Activation in Weakly-supervised Object Localization via Counterfactual Learning.
CoRR, 2023

TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding.
CoRR, 2023

Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning.
CoRR, 2023

Decomposed Prototype Learning for Few-Shot Scene Graph Generation.
CoRR, 2023

Learning Combinatorial Prompts for Universal Controllable Image Captioning.
CoRR, 2023

Further Improving Weakly-supervised Object Localization via Causal Knowledge Distillation.
CoRR, 2023

Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Generalized Universal Domain Adaptation with Generative Flow Networks.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

FedAA: Using Non-sensitive Modalities to Improve Federated Learning while Preserving Image Privacy.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Dark Knowledge Balance Learning for Unbiased Scene Graph Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Fairness-aware Contrastive Learning with Partially Annotated Sensitive Attributes.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Video Scene Graph Generation from Single-Frame Weak Supervision.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Compositional Feature Augmentation for Unbiased Scene Graph Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
TICS: text-image-based semantic CAPTCHA synthesis via multi-condition adversarial learning.
Vis. Comput., 2022

Shuhai: A Tool for Benchmarking High Bandwidth Memory on FPGAs.
IEEE Trans. Computers, 2022

ROBY: Evaluating the adversarial robustness of a deep model by its decision boundaries.
Inf. Sci., 2022

Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey.
Neurocomputing, 2022

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation.
CoRR, 2022

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation.
CoRR, 2022

Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives.
CoRR, 2022

SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Unified Normalization for Accelerating and Stabilizing Transformers.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Learning Hybrid Behavior Patterns for Multimedia Recommendation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Rethinking the Reference-based Distinctive Image Captioning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Bidirectional Self-Training with Multiple Anisotropic Prototypes for Domain Adaptive Semantic Segmentation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2022

Dynamic Feature Pyramid Networks for Detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Explicit Image Caption Editing.
Proceedings of the Computer Vision - ECCV 2022, 2022

Domain Generalization with Global Sample Mixup.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Rethinking Data Augmentation for Robust Visual Question Answering.
Proceedings of the Computer Vision - ECCV 2022, 2022

The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DUDA: Online-Offline Dual Domain Adaption for Semantic Segmentation.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Rethinking the Evaluation of Unbiased Scene Graph Generation.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Explore Video Clip Order With Self-Supervised and Curriculum Learning for Video Applications.
IEEE Trans. Multim., 2021

Tell and guess: cooperative learning for natural image caption generation with hierarchical refined attention.
Multim. Tools Appl., 2021

Unified Group Fairness on Federated Learning.
CoRR, 2021

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey.
CoRR, 2021

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching.
CoRR, 2021

Efficient Ring-topology Decentralized Federated Learning with Deep Generative Models for Industrial Artificial Intelligent.
CoRR, 2021

Design of Deep Learning Model for Task-Evoked fMRI Data Classification.
Comput. Intell. Neurosci., 2021

Improving Weakly Supervised Object Localization via Causal Intervention.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Video Relation Detection via Tracklet based Visual Transformer.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Natural Language Video Localization with Learnable Moment Proposals.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.
Proceedings of the Artificial Intelligence - First CAAI International Conference, 2021

Consensus Graph Representation Learning for Better Grounded Image Captioning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Boundary Proposal Network for Two-stage Natural Language Video Localization.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Multichannel Attention Refinement for Video Question Answering.
ACM Trans. Multim. Comput. Commun. Appl., 2020

Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks.
IEEE Trans. Image Process., 2020

Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering.
Neural Process. Lett., 2020

Multi-platform data collection for public service with Pay-by-Data.
Multim. Tools Appl., 2020

Video question answering via grounded cross-attention network learning.
Inf. Process. Manag., 2020

Abstractive meeting summarization by hierarchical adaptive segmental network learning with multiple revising steps.
Neurocomputing, 2020

ROBY: Evaluating the Robustness of a Deep Model by its Decision Boundaries.
CoRR, 2020

GFL: A Decentralized Federated Learning Framework Based On Blockchain.
CoRR, 2020

Federated Unsupervised Representation Learning.
CoRR, 2020

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding.
CoRR, 2020

Evaluation Framework For Large-scale Federated Learning.
CoRR, 2020

Reinforcement-Learning based Portfolio Management with Augmented Asset Movement Prediction States.
CoRR, 2020

Hierarchical Fashion Graph Network for Personalized Outfit Recommendation.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Relational Graph Learning for Grounded Video Description Generation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Photo Stream Question Answer.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

De-Biased Court's View Generation with Causality.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Rethinking the Bottom-Up Framework for Query-Based Video Localization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Video Question Answering via Knowledge-based Progressive Spatial-Temporal Attention Network.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Explorations of skeleton features for LSTM-based action recognition.
Multim. Tools Appl., 2019

Adversarial learning for viewpoints invariant 3D human pose estimation.
J. Vis. Commun. Image Represent., 2019

An artificial intelligence based data-driven approach for design ideation.
J. Vis. Commun. Image Represent., 2019

Galaxy Learning - A Position Paper.
CoRR, 2019

Exploratory Analysis for Big Social Data Using Deep Network.
IEEE Access, 2019

Video Dialog via Multi-Grained Convolutional Self-Attention Context Networks.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Video Relation Detection with Spatio-Temporal Graph.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Multi-interaction Network with Object Relation for Video Question Answering.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Weak Supervision Enhanced Generative Network for Question Generation.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Training Encrypted Models with Privacy-preserved Data on Blockchain.
Proceedings of the ICVISP 2019: 3rd International Conference on Vision, 2019

Counterfactual Critic Multi-Agent Training for Scene Graph Generation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Video Dialog via Progressive Inference and Cross-Transformer.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Zhejiang University at ImageCLEF 2019 Visual Question Answering in the Medical Domain.
Proceedings of the Working Notes of CLEF 2019, 2019

2018
Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks.
IEEE Trans. Multim., 2018

Scene Dynamics: Counterfactual Critic Multi-Agent Training for Scene Graph Generation.
CoRR, 2018

Textually Guided Ranking Network for Attentional Image Retweet Modeling.
CoRR, 2018

Attentional Image Retweet Modeling via Multi-Faceted Ranking Network Learning.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Video question answering via multi-granularity temporal attention network learning.
Proceedings of the 10th International Conference on Internet Multimedia Computing and Service, 2018

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Matryoshka Peek: Toward Learning Fine-Grained, Robust, Discriminative Features for Product Search.
IEEE Trans. Multim., 2017

Bag-of-Discriminative-Words (BoDW) Representation via Topic Modeling.
IEEE Trans. Knowl. Data Eng., 2017

Temporal Interaction and Causal Influence in Community-Based Question Answering.
IEEE Trans. Knowl. Data Eng., 2017

Hierarchical Contextual Attention Recurrent Neural Network for Map Query Suggestion.
IEEE Trans. Knowl. Data Eng., 2017

A human motion feature based on semi-supervised learning of GMM.
Multim. Syst., 2017

Disambiguating named entities with deep supervised learning via crowd labels.
Frontiers Inf. Technol. Electron. Eng., 2017

Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Network.
CoRR, 2017

On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks.
Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision, 2017

Learning Max-Margin GeoSocial Multimedia Network Representations for Point-of-Interest Suggestion.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Video Question Answering via Attribute-Augmented Attention Network Learning.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

ENCORE: External Neural Constraints Regularized Distant Supervision for Relation Extraction.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Video Question Answering via Gradually Refined Attention over Appearance and Motion.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Graph-theoretic spatiotemporal context modeling for video saliency detection.
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Integrating Side Information for Boosting Machine Comprehension.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

2016
Structure-Aware Slow Feature Analysis for Age Estimation.
IEEE Signal Process. Lett., 2016

Fast view-based 3D model retrieval via unsupervised multiple feature fusion and online projection learning.
Signal Process., 2016

LSTM-in-LSTM for generating long descriptions of images.
Comput. Vis. Media, 2016

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning.
CoRR, 2016

Diverse Image Captioning via GroupTalk.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Self-Paced Boost Learning for Classification.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

A 3D human motion refinement method based on sparse motion bases selection.
Proceedings of the 29th International Conference on Computer Animation and Social Agents, 2016

2015
Mining Spatial-Temporal Patterns and Structural Sparsity for Human Motion Data Denoising.
IEEE Trans. Cybern., 2015

Sketch-based human motion retrieval via selected 2D geometric posture descriptor.
Signal Process., 2015

Sparse motion bases selection for human motion denoising.
Signal Process., 2015

View-invariant human action recognition via robust locally adaptive multi-view learning.
Frontiers Inf. Technol. Electron. Eng., 2015

Efficient semi-supervised multiple feature fusion with out-of-sample extension for 3D model retrieval.
Neurocomputing, 2015

A locally weighted sparse graph regularized Non-Negative Matrix Factorization method.
Neurocomputing, 2015

Continuous Angle-based Outlier Detection on High-dimensional Data Streams.
Proceedings of the 19th International Database Engineering & Applications Symposium, 2015

Metric Learning Driven Multi-Task Structured Output Optimization for Robust Keypoint Tracking.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Real-time motion data annotation via action string.
Comput. Animat. Virtual Worlds, 2014

Human motion retrieval based on freehand sketch.
Comput. Animat. Virtual Worlds, 2014

Exploiting temporal stability and low-rank structure for motion capture data refinement.
Inf. Sci., 2014

2013
Retrieval-based cartoon gesture recognition and applications via semi-supervised heterogeneous classifiers learning.
Pattern Recognit., 2013

A semantic feature for human motion retrieval.
Comput. Animat. Virtual Worlds, 2013

Hypergraph Spectral Hashing for image retrieval with heterogeneous social contexts.
Neurocomputing, 2013

2012
Synthesizing style-preserving cartoons via non-negative style factorization.
J. Zhejiang Univ. Sci. C, 2012

Active learning for social image retrieval using Locally Regressive Optimal Design.
Neurocomputing, 2012

Adaptive Unsupervised Multi-view Feature Selection for Visual Concept Recognition.
Proceedings of the Computer Vision - ACCV 2012, 2012

2011
Learning a 3D Human Pose Distance Metric from Geometric Pose Descriptor.
IEEE Trans. Vis. Comput. Graph., 2011

Predicting missing markers in human motion capture using <i>l</i>1-sparse representation.
Comput. Animat. Virtual Worlds, 2011

Videoader: a video advertising system based on intelligent analysis of visual content.
Proceedings of the ICIMCS 2011, 2011

2010
A group of novel approaches and a toolkit for motion capture data reusing.
Multim. Tools Appl., 2010

Silhouette representation and matching for 3D pose discrimination - A comparative study.
Image Vis. Comput., 2010

Real-time digitised shadow play performance method based on multi-point interactive controlling method.
Int. J. Comput. Appl. Technol., 2010

A script engine for realistic human motion generation.
Int. J. Comput. Appl. Technol., 2010

2009
Competitive motion synthesis based on hybrid control.
Comput. Animat. Virtual Worlds, 2009

Perceptual 3D pose distance estimation by boosting relational geometric features.
Comput. Animat. Virtual Worlds, 2009

2008
Perspective-aware cartoon clips synthesis.
Comput. Animat. Virtual Worlds, 2008

Active post-refined multimodality video semantic concept detection with tensor representation.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Adaptive and compact shape descriptor by progressive feature combination and selection with boosting.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

2007
Adaptive control in cartoon data reusing.
Comput. Animat. Virtual Worlds, 2007

A Piece-Wise Learning Approach to 3D Facial Animation.
Proceedings of the Advances in Web Based Learning, 2007

2006
Towards Robust 3D Reconstruction of Human Motion from Monocular Video.
Proceedings of the Advances in Artificial Reality and Tele-Existence, 2006

An Efficient Keyframe Extraction from Motion Capture Data.
Proceedings of the Advances in Computer Graphics, 2006

2005
Automatic generation of human animation based on motion programming.
Comput. Animat. Virtual Worlds, 2005


  Loading...