Abstract:
Globally, breast cancer is the number one killer of all cancer diseases in women. The diseases commonly occur in high income
countries, but recently there is rapid increase of breast cancer in middle and low income countries in Asia, Latin America and Africa. This is
due to increase in life expectancy, increased urbanization and adoption of western cultures. Although, some strategies to reduce the risks of
occurrence of breast cancer are being implemented in high income countries, the case in middle and low income countries is that majority cases
are affected by breast cancer disease due to diagnosis at late stages of the diseases. Therefore, early detection of breast cancer is needed to
overcome this problem. In this paper, a holistic diagnosis tool for early detection of breast cancer is proposed. The tool is software based
utilizing a novel prediction model for breast cancer survivability developed by using available data mining (DM) technologies. Specifically,
five popular data mining algorithms (logistic regression, decision tree, support vector machine, K nearest neighbors and random forest) were
used to develop the prediction tool using Wisconsin breast cancer data set. In the paper, prediction tool training and test set results are reported.
Achieved from the reported work of training sets are classification accuracies of 100% (Decision Tree); 99.8046% (Random Forest); 97.46%
(Logistic Regression and Support Vector Machine); 97.07% (K Nearest Neighbors) and for testing sets are classification accuracies of
93.5672% (Decision Tree); 92.9% (Random Forest); 92.39% (Logistic Regression, Support Vector Machine and K Nearest Neighbors). These
results are much better than those reported in the literature. The results show that the proposed DM disease prediction tool has potential to
greatly impact on current patient management, care and future interventions against the breast cancer disease and through customization even
against other deadly diseases.