Mengmeng Wang

Orcid: 0000-0003-4035-0630

Affiliations:
  • Zhejiang University, Laboratory of Advanced Perception on Robotics and Intelligent Learning, Hangzhou, China
  • Zhejiang University, College of Control Science and Engineering, Institute of Cyber-Systems and Control, Hangzhou, China (PhD 2024)


According to our database1, Mengmeng Wang authored at least 85 papers between 2015 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Corrigendum to "MA-FSAR: Multimodal Adaptation of CLIP for few-shot action recognition" [Pattern Recognition 169 (2026) 111902].
Pattern Recognit., 2026

MA-FSAR: Multimodal Adaptation of CLIP for few-shot action recognition.
Pattern Recognit., 2026

2025
TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking.
CoRR, July, 2025

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling.
CoRR, July, 2025

TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP.
CoRR, July, 2025

Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization.
CoRR, May, 2025

Adding Before Pruning: Sparse Filter Fusion for Deep Convolutional Neural Networks via Auxiliary Attention.
IEEE Trans. Neural Networks Learn. Syst., March, 2025

Model-Heterogeneous Federated Graph Learning With Prototype Propagation.
IEEE Trans. Artif. Intell., March, 2025

ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition.
IEEE Trans. Neural Networks Learn. Syst., January, 2025

Manifold Constraint Reduces Exposure Bias in Accelerated Diffusion Sampling.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Density-aware and Depth-aware Visual Representation for Zero-Shot Object Counting.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Action Detail Matters: Refining Video Recognition with Local Action Queries.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SpotActor: Training-Free Layout-Controlled Consistent Image Generation.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Cross-device Federated Recommendation - Privacy-Preserving Personalization
Springer, ISBN: 978-981-96-3211-4, 2025

2024
AGDF-Net: Learning Domain Generalizable Depth Features With Adaptive Guidance Fusion.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Learning spatiotemporal relationships with a unified framework for video object segmentation.
Appl. Intell., April, 2024

Visual-Based Kinematics and Pose Estimation for Skid-Steering Robots.
IEEE Trans Autom. Sci. Eng., January, 2024

Camera-Based 3D Semantic Scene Completion With Sparse Guidance Network.
IEEE Trans. Image Process., 2024

LiDAR video object segmentation with dynamic kernel refinement.
Pattern Recognit. Lett., 2024

Visual Object Tracking across Diverse Data Modalities: A Review.
CoRR, 2024

DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation.
CoRR, 2024

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition.
CoRR, 2024

OneActor: Consistent Subject Generation via Cluster-Conditioned Guidance.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Robotic-centric Paradigm for 3D Human Tracking Under Complex Environments Using Multi-modal Adaptation.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2024

A Multimodal, Multi-Task Adapting Framework for Video Action Recognition.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking via Memory Networks.
Proceedings of the International Conference on 3D Vision, 2024

2023
Data-free quantization via mixed-precision compensation without fine-tuning.
Pattern Recognit., November, 2023

Hierarchical supervisions with two-stream network for Deepfake detection.
Pattern Recognit. Lett., August, 2023

Exploiting semantic-level affinities with a mask-guided network for temporal action proposal in videos.
Appl. Intell., June, 2023

Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

Correlation-based and content-enhanced network for video style transfer.
Pattern Anal. Appl., February, 2023

Fast Real-Time Video Object Segmentation with a Tangled Memory Network.
ACM Trans. Intell. Syst. Technol., 2023

Improving dynamic gesture recognition in untrimmed videos by an online lightweight framework and a new gesture dataset ZJUGesture.
Neurocomputing, 2023

Camera-based 3D Semantic Scene Completion with Sparse Guidance Network.
CoRR, 2023

Multimodal Adaptation of CLIP for Few-Shot Action Recognition.
CoRR, 2023

Continuous-Time Fixed-Lag Smoothing for LiDAR-Inertial-Camera SLAM.
CoRR, 2023

Learning Discretized Neural Networks under Ricci Flow.
CoRR, 2023

BSNet: Lane Detection via Draw B-spline Curves Nearby.
CoRR, 2023

CenterLPS: Segment Instances by Centers for LiDAR Panoptic Segmentation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion.
IROS, 2023

PANet: LiDAR Panoptic Segmentation with Sparse Instance Proposal and Aggregation.
IROS, 2023

Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

RICO: Regularizing the Unobservable for Indoor Compositional Reconstruction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Correlation Pyramid Network for 3D Single Object Tracking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Revisiting the Spatial and Temporal Modeling for Few-Shot Action Recognition.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Extended Feature Pyramid Network for Small Object Detection.
IEEE Trans. Multim., 2022

Delving Deeper Into Mask Utilization in Video Object Segmentation.
IEEE Trans. Image Process., 2022

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection.
IEEE Trans. Circuits Syst. Video Technol., 2022

Multiple Object Tracking of Drone Videos by a Temporal-Association Network with Separated-Tasks Structure.
Remote. Sens., 2022

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context.
Proceedings of the Computer Vision - ECCV 2022, 2022

Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Unpaired salient object translation via spatial attention prior.
Neurocomputing, 2021

Cross-modality online distillation for multi-view action recognition.
Neurocomputing, 2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model.
CoRR, 2021

MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation.
CoRR, 2021

Explicitly Modeling the Discriminability for Instance-Aware Visual Object Tracking.
CoRR, 2021

ActionCLIP: A New Paradigm for Video Action Recognition.
CoRR, 2021

TransVOS: Video Object Segmentation with Transformers.
CoRR, 2021

Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

RFNet: Recurrent Forward Network for Dense Point Cloud Completion.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

One-shot Face Reenactment Using Appearance Adaptive Normalization.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Structure-aware Person Image Generation with Pose Decomposition and Semantic Correlation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for Depth Completion.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for Monocular Depth Completion.
CoRR, 2020

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition.
CoRR, 2020

Semantic Graph Based Place Recognition for 3D Point Clouds.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

DTVNet: Dynamic Time-Lapse Video Generation via Single Still Image.
Proceedings of the Computer Vision - ECCV 2020, 2020

FReeNet: Multi-Identity Face Reenactment.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

FDN: Feature Decoupling Network for Head Pose Estimation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
STM: SpatioTemporal and Motion Encoding for Action Recognition.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2017
Real-time 3D human tracking for mobile robots with multisensors.
Proceedings of the 2017 IEEE International Conference on Robotics and Automation, 2017

Large Margin Object Tracking with Circulant Feature Maps.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Robust object tracking with a hierarchical ensemble framework.
Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016

2015
Robust Object Tracking with a Hierarchical Ensemble Framework.
CoRR, 2015


  Loading...