Abstract:
The demand for IT infrastructures has grown due to their importance in business and everyday life. Downtime due to the
unavailability of any IT infrastructure components is undesirable. Ensuring IT infrastructure’s continuous availability and stability is
crucial for organisations to prevent downtime and its associated consequences. Thus, prompt failure detection, analysis of underlying
causes, and corrective measures are vital. IT infrastructure logs register every detail of the executed operation and provide a lot of
dimensional information about it. Therefore, the research field of IT infrastructure failure detection and prediction using log analysis
techniques is gaining prominence. The proposed method uses a BERT pre-trained model-based semantic analysis framework and an
attention-based mechanism OLSTM classification model. Furthermore, the remediation model offers failure notifications to the system
administrator on the dashboard and registered email ID, along with potential solutions to address the issue and mitigate the failure of IT
Infrastructure components. The effectiveness of the developed prediction and remedial system was evaluated on a real-time Windows
infrastructure by implementing a proof of concept. In this process, the trained model was utilised to analyse newly generated log entries
and forecast potential failure situations. Consequently, a remediation strategy was applied in order to address the problem and prevent
downtime effectively. The integration of automatic failure detection and prediction using IT infrastructure logs has the potential to
become a routine practice in IT infrastructure monitoring. The suggested remediation approach shows promise in being widely adopted
for timely failure mitigation, resulting in reduced downtime.