Pose-Motion Video Anomaly Detection via Memory-Augmented Reconstruction and Conditional Variational Prediction

Published in ICME, 2023

Video Anomaly Detection (VAD) is critical for intelligent surveillance, yet traditional pixel-based methods suffer from high computational costs and privacy concerns. Skeleton-based approaches offer a lightweight alternative but often struggle to distinguish between normal and abnormal behaviors due to the decoupling of spatial and temporal features. This paper addresses these limitations by proposing PoMo, a novel framework that integrates pose (spatial) and motion (temporal) analysis.

The proposed method utilizes a dual-branch architecture to capture the complexity of human behavior. The first branch employs a Memory-augmented AutoEncoder (MemAE) to reconstruct human poses, utilizing a memory module to enhance the representation of normal patterns while suppressing anomalies. The second branch uses a Conditional Variational AutoEncoder (CVAE) to predict future motion based on historical sequences, effectively modeling the probabilistic nature of dynamic movements. By jointly evaluating reconstruction and prediction errors, PoMo achieves robust performance on benchmarks like ShanghaiTech and UBnormal.