Abstract:
Network traffic identification and classification in the current scenario are not only required for traffic management but in designing a future protocol for user-specific services and improve user experiences. This fundamental step of network management is perceived by the researcher long back and started developing techniques for the same. The traditional techniques for traffic identification and classification include port and payload based. The current large and complex network poses many challenges to the researchers in designing approaches for traffic classification by using dynamic ports, encryption, and masquerading techniques. The complexity is further enhanced due to increased dependence on the Internet and diverse applications to enable network administrators including ISPs to manage the network intelligently and efficiently. As traditional techniques are not effective to address the current challenges, a hybrid solution is explored. The hybrid approaches make use of statistics or behavioral-based, heuristic-based, machine learning-based along with feature selection techniques. In this paper, apart from developing enhanced hybrid approaches for identifying the P2P traffic, an extensive real dataset of size 924 GB is constructed to analyze the effectiveness of the proposed approaches. A number of hybrid approaches are designed by using feature selection techniques and machine learning (ML) algorithms. Extensive analysis of proposed hybrid approaches along with the comparative study reveals that Chi-Square and Random Forest outperform other state-of-art approaches yielding an accuracy rate of 99.46%.