Abstract:
Implementation of facial expression recognition can help improve human-computer interaction in various aspects, such as
education, entertainment, health and more. In this study, convolutional neural networks (CNN) were designed and implemented to
recognize facial expression. The FER2013 dataset was used to train the models which have seven different emotion classes: anger,
disgust, fear, happiness, sadness, surprise and neutral. The purpose of this study is to compare the computational load of 23 different
CNN models for the facial expression recognition task on a mobile device. In this study, we compare ResNet101V2, MobileNet, and
EfficientNetV2B3 as the top three candidate models among the other 23 models that we have tried, achieving the highest overall
accuracy on the testing set. The highest overall accuracy is achieved by the EfficientNetV2B3 model at 61.9%, while the MobileNet
model has the lowest overall accuracy at 58.8%. We then compare computational load based on average inference time, peak CPU
usage, and peak memory usage on a mobile device. The results show that MobileNet has the lowest computational load but the
lowest overall accuracy. On the other hand, EfficientNetV2B3 has the highest overall accuracy with less computing load than
MobileNet. Therefore, we recommend EfficientNetV2B3 for real-time facial expression recognition using CNN on mobile devices.