Abstract:
In this work, we apply a two stage anomaly-based network intrusion detection process using the UNSW-NB15 dataset. We use Recursive Feature Elimination and Random Forests among other techniques to select the best dataset features for the purpose of machine learning; then we perform a binary classification in order to identify intrusive traffic from normal one, using a number of data mining techniques, including Logistic Regression, Gradient Boost Machine, and Support Vector Machine. Results of this first stage classification show that the use of Support Vector Machine reports the highest accuracy (82.11%). We then feed the output of Support Vector Machine to a range of multinomial classifiers in order to improve the accuracy of predicting the type of attacks. Specifically, we evaluate the performance of Decision Trees (C5.0), Naïve Bayes and multinomial Support Vector Machine. Applying C5.0 yielded the highest accuracy (74%) and F1 score (86%), and the two-stage hybrid classification improved the accuracy of results by up to 12% (achieving a multi-classification accuracy of 86.04%). Finally, with the support of our results, we present constructive criticism of the UNSW-NB15 dataset.