Abstract:
From the last two decades, it can be observed that the rate of education getting increased day by day all over the globe.
Therefore to predict the student’s performance is considered as an emerging research area under educational data mining. Previous
studies have noticed that most of the available educational datasets are of a low sample size. These datasets provide fewer generalization
opportunities, which makes them difficult to analyze. Previous approaches use noise filtering, data balancing, GAN-based
oversampling, or mostly rely on classifiers' performance. In this paper, the proposed approach will provide an improved model that
will optimize the classifier's performance and remove the adverse effects of noisy instances and increase data balancing tendency in a
better way. The proposed model is based on CTGAN (Conditional Tabular Generative Model), NCC (Nearest Centroid Classifier)
combined with data balancing algorithm SMOTE-IPF(Iterative-Partitioning Filter) to increase dataset size by keeping their balanced
nature intact and also to minimize the negative effect of noisy data points. Finally, for prediction six classifiers Random Forest (RT),
Gradient Boosting (GB), CAT Boost (CT), Extra Tree (ET), KNN, and AdaBoost (AB) are hyperparameter tuned and Stacked ensemble
among the best of them is created. The detailed analysis of results elaborates that the proposed model outperforms previous approaches
by 2-2.5% in terms of Accuracy, ROC.