Mengmeng Wang

Orcid: 0000-0003-4035-0630

Affiliations:

Zhejiang University, Laboratory of Advanced Perception on Robotics and Intelligent Learning, Hangzhou, China
Zhejiang University, College of Control Science and Engineering, Institute of Cyber-Systems and Control, Hangzhou, China (PhD 2024)

According to our database¹, Mengmeng Wang authored at least 90 papers between 2015 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2026

Corrigendum to "MA-FSAR: Multimodal Adaptation of CLIP for few-shot action recognition" [Pattern Recognition 169 (2026) 111902].

[BibT_eX]

[DOI]

Pattern Recognit., 2026

MA-FSAR: Multimodal Adaptation of CLIP for few-shot action recognition.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

2025

Improving Region Representation Learning from Urban Imagery with Noisy Long-Caption Supervision.

[BibT_eX]

[DOI]

CoRR, November, 2025

Deforming Videos to Masks: Flow Matching for Referring Video Segmentation.

[BibT_eX]

[DOI]

CoRR, October, 2025

TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking.

[BibT_eX]

[DOI]

CoRR, July, 2025

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling.

[BibT_eX]

[DOI]

Victor Shea-Jay Huang

CoRR, July, 2025

TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP.

[BibT_eX]

[DOI]

CoRR, July, 2025

Adding Before Pruning: Sparse Filter Fusion for Deep Convolutional Neural Networks via Auxiliary Attention.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., March, 2025

Model-Heterogeneous Federated Graph Learning With Prototype Propagation.

[BibT_eX]

[DOI]

IEEE Trans. Artif. Intell., March, 2025

ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., January, 2025

LLM-TPF: Multiscale Temporal Periodicity-Semantic Fusion LLMs for Time Series Forecasting.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

VidEvo: Evolving Video Editing through Exhaustive Temporal Modeling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

DynaMind: Reasoning over Abstract Video Dynamics for Embodied Decision-Making.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Manifold Constraint Reduces Exposure Bias in Accelerated Diffusion Sampling.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Density-aware and Depth-aware Visual Representation for Zero-Shot Object Counting.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Action Detail Matters: Refining Video Recognition with Local Action Queries.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SpotActor: Training-Free Layout-Controlled Consistent Image Generation.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Cross-device Federated Recommendation - Privacy-Preserving Personalization

[BibT_eX]

[DOI]

Springer, ISBN: 978-981-96-3211-4, 2025

2024

AGDF-Net: Learning Domain Generalizable Depth Features With Adaptive Guidance Fusion.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Learning spatiotemporal relationships with a unified framework for video object segmentation.

[BibT_eX]

[DOI]

Appl. Intell., April, 2024

Visual-Based Kinematics and Pose Estimation for Skid-Steering Robots.

[BibT_eX]

[DOI]

IEEE Trans Autom. Sci. Eng., January, 2024

Camera-Based 3D Semantic Scene Completion With Sparse Guidance Network.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

LiDAR video object segmentation with dynamic kernel refinement.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2024

Visual Object Tracking across Diverse Data Modalities: A Review.

[BibT_eX]

[DOI]

CoRR, 2024

DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation.

[BibT_eX]

[DOI]

CoRR, 2024

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

OneActor: Consistent Subject Generation via Cluster-Conditioned Guidance.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Robotic-centric Paradigm for 3D Human Tracking Under Complex Environments Using Multi-modal Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2024

A Multimodal, Multi-Task Adapting Framework for Video Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking via Memory Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2024

2023

Data-free quantization via mixed-precision compensation without fine-tuning.

[BibT_eX]

[DOI]

Pattern Recognit., November, 2023

Hierarchical supervisions with two-stream network for Deepfake detection.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., August, 2023

Exploiting semantic-level affinities with a mask-guided network for temporal action proposal in videos.

[BibT_eX]

[DOI]

Appl. Intell., June, 2023

Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

Correlation-based and content-enhanced network for video style transfer.

[BibT_eX]

[DOI]

Pattern Anal. Appl., February, 2023

Fast Real-Time Video Object Segmentation with a Tangled Memory Network.

[BibT_eX]

[DOI]

ACM Trans. Intell. Syst. Technol., 2023

Improving dynamic gesture recognition in untrimmed videos by an online lightweight framework and a new gesture dataset ZJUGesture.

[BibT_eX]

[DOI]

Neurocomputing, 2023

Camera-based 3D Semantic Scene Completion with Sparse Guidance Network.

[BibT_eX]

[DOI]

CoRR, 2023

Multimodal Adaptation of CLIP for Few-Shot Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

Continuous-Time Fixed-Lag Smoothing for LiDAR-Inertial-Camera SLAM.

[BibT_eX]

[DOI]

CoRR, 2023

Learning Discretized Neural Networks under Ricci Flow.

[BibT_eX]

[DOI]

CoRR, 2023

BSNet: Lane Detection via Draw B-spline Curves Nearby.

[BibT_eX]

[DOI]

Haoxin Chen

Mengmeng Wang

Yong Liu

CoRR, 2023

CenterLPS: Segment Instances by Centers for LiDAR Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion.

[BibT_eX]

[DOI]

IROS, 2023

PANet: LiDAR Panoptic Segmentation with Sparse Instance Proposal and Aggregation.

[BibT_eX]

[DOI]

IROS, 2023

Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

RICO: Regularizing the Unobservable for Indoor Compositional Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Correlation Pyramid Network for 3D Single Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Revisiting the Spatial and Temporal Modeling for Few-Shot Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Extended Feature Pyramid Network for Small Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

Delving Deeper Into Mask Utilization in Video Object Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2022

Multiple Object Tracking of Drone Videos by a Temporal-Association Network with Separated-Tasks Structure.

[BibT_eX]

[DOI]

Remote. Sens., 2022

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Unpaired salient object translation via spatial attention prior.

[BibT_eX]

[DOI]

Neurocomputing, 2021

Cross-modality online distillation for multi-view action recognition.

[BibT_eX]

[DOI]

Neurocomputing, 2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2021

MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2021

Explicitly Modeling the Discriminability for Instance-Aware Visual Object Tracking.

[BibT_eX]

[DOI]

Mengmeng Wang

Xiaoqian Yang

Yong Liu

CoRR, 2021

ActionCLIP: A New Paradigm for Video Action Recognition.

[BibT_eX]

[DOI]

Mengmeng Wang

Jiazheng Xing

Yong Liu

CoRR, 2021

TransVOS: Video Object Segmentation with Transformers.

[BibT_eX]

[DOI]

CoRR, 2021

Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

RFNet: Recurrent Forward Network for Dense Point Cloud Completion.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

One-shot Face Reenactment Using Appearance Adaptive Normalization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Structure-aware Person Image Generation with Pose Decomposition and Semantic Correlation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for Depth Completion.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for Monocular Depth Completion.

[BibT_eX]

[DOI]

CoRR, 2020

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Semantic Graph Based Place Recognition for 3D Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

DTVNet: Dynamic Time-Lapse Video Generation via Single Still Image.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

FReeNet: Multi-Identity Face Reenactment.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

FDN: Feature Decoupling Network for Head Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

STM: SpatioTemporal and Motion Encoding for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2017

Real-time 3D human tracking for mobile robots with multisensors.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Robotics and Automation, 2017

Large Margin Object Tracking with Circulant Feature Maps.

[BibT_eX]

[DOI]

Mengmeng Wang

Yong Liu

Zeyi Huang

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Robust object tracking with a hierarchical ensemble framework.

[BibT_eX]

[DOI]

Mengmeng Wang

Yong Liu

Rong Xiong

Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016

2015

Robust Object Tracking with a Hierarchical Ensemble Framework.

[BibT_eX]

[DOI]

Mengmeng Wang

Yong Liu

CoRR, 2015

Mengmeng Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...