Abstract:
This work aims to engineer a robust system capable of real-time detection, accurately discerning individuals who are either
adhering to or neglecting face mask mandates, across a diverse range of scenarios encompassing images, videos, and live camera
streams. This study improved the architecture of YOLOv8n for face mask detection by building a new two-modification version of
Yolov8n model to improve feature extraction and prediction network for YOLOv8n. In Proposed yolov8n-v1, we outline the integration
of a residual Network backbone into the Yolov8n architecture by replacing the first two layers of yolov8n with ResNet_Stem and
ResNet_Block modules to improve the model’s ability for feature extraction and replace Spatial Pyramid Pooling Fast (SPPF) module
with Spatial Pyramid Pooling-Cross Stage Partial (SPPCSP) modules which combine SPP and CSP to create a network that is both
effective and efficient. The proposed yolov8n-v2 is built by integration Ghostconv and ResNet_Downsampling modules into the
Proposed yolov8n-v1 backbone. All models have been tested and evaluated on two datasets, The first one is MJFR dataset, which
contains 23,621 images. and collected by the authors of this paper from four distinct datasets, all of which were used for facemask
detection purposes. The second one is MSFM object detection dataset has been collected from groups of videos in real life and images
based on the curriculum learning technology. The model’s performance is assessed by using the following metrics: mean average
precision (mAP50), mAP50-95, recall (R) and precision (P). We conclude that both versions of Proposed yolov8n outperform the
original model in terms of accuracy for both datasets. Finally, the system was successfully implemented in one of the medical clinics
affiliated with a medical complex, where the results of its application showed high efficiency in various aspects of work, and it
effectively contributed to improving the public health and safety.