Abstract:
The demand for IT infrastructures has grown due to their importance in business and everyday life. Downtime due to the
unavailability of any IT infrastructure components is undesirable. Ensuring IT infrastructure's continuous availability and stability is
crucial for organizations to prevent downtime and its associated consequences. Thus, prompt failure detection, analysis of underlying
causes, and corrective measures are vital. IT infrastructure logs register every detail of the executed operation and provide a lot of
dimensional information about it. Therefore, the research field of IT infrastructure failure detection and prediction using log analysis
techniques is gaining prominence. The proposed method uses a BERT pre-trained model-based semantic analysis framework and an
attention-based mechanism OLSTM classification model. Furthermore, the remediation model offers failure notifications to the system
administrator on the dashboard and registered email ID, along with potential solutions to address the issue and mitigate the failure of
IT Infrastructure components. The effectiveness of the developed prediction and remedial system was evaluated on a real-time
Windows infrastructure by implementing a proof of concept. In this process, the trained model was utilized to analyse newly generated
log entries and forecast potential failure situations. Consequently, a remediation strategy was applied in order to address the problem
and prevent downtime effectively.
The integration of automatic failure detection and prediction using IT infrastructure logs has the potential to become a routine practice
in IT infrastructure monitoring. The suggested remediation approach shows promise in being widely adopted for timely failure
mitigation, resulting in reduced downtime.