A Comprehensive Comparative Study of Machine Learning Algorithms for Water Potability Classification

Ahmad Musleh, Fuad

doi:10.12785/ijcds/150184

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 15
→
Issue 01
→
View Item

A Comprehensive Comparative Study of Machine Learning Algorithms for Water Potability Classification

Ahmad Musleh, Fuad

DOI: 10.12785/ijcds/150184

ISSN: 2210-142X

Date: 2024-03-1

Abstract:

Water quality (WQ) prediction is of utmost importance due to the scarcity of uncontaminated water resources. In this study, six machine learning (ML) algorithms, including Bagging classifier, Logistic regression (LR), J48, Random Forest (RF), IBk, and AdaBoostM1, were employed to assess water potability. Evaluation metrics such as accuracy, recall, precision, F-measure, false positive (FP) rate, receiver operating characteristic (ROC) area, and precision-recall curve (PRC) area were used to compare the capability of the models. The outcomes of the comparative analysis revealed that RF and J48 achieved the highest accuracy values of 0.993, followed closely by the Bagging classifier with an accuracy of 0.992. The AdaBoostM1 algorithm achieved an accuracy of 0.971, while the LR algorithm achieved an accuracy of 0.958. The IBK algorithm showed a lower accuracy of 0.714. The comparative analysis of the FP rate metric demonstrated that RF achieved the lowest rate of 0.006, followed closely by the Bagging classifier and J48, both with a rate of 0.007. AdaBoostM1, LR, and IBK had higher rates of 0.026, 0.041, and 0.289, respectively. Regarding precision, RF and J48 achieved the highest precision rates of 0.993, followed by the Bagging classifier at 0.992. The AdaBoostM1 algorithm achieved a precision rate of 0.972, and LR achieved 0.958. IBK showed less precision rate of 0.714. For the recall metric, RF and J48 achieved the highest recall values of 0.993, followed closely by the Bagging classifier with a recall value of 0.992. The AdaBoostM1 algorithm obtained a recall value of 0.971, while LR and IBK achieved values of 0.958 and 0.714, respectively. The study highlights the effectiveness of RF, J48, and the Bagging classifier in predicting water potability. These findings contribute valuable insights for the implementation of accurate prediction models, supporting the sustainable management of water resources.

Show full item record