Abstract:
In an effort to address the growing issue of misinformation on social media, particularly in the context of the Covid-19 pandemic, we have diligently developed a comprehensive dataset on Bangla misinformation. This dataset was scraped from FactWatch, a leading fact-checking organization in Bangladesh, and annotated with fact ratings. It includes a meticulously curated collection of 1014 fact-checked reports spanning from October 4, 2021, to May 25, 2023. These reports encompass a diverse array of summaries, categories, and reliable correctness labels, providing samples of the original fake news content along with investigative descriptions of the fact-checking processes employed. The dataset represents a significant contribution to Bangladesh's participation in the global effort to combat fake news and serves as a crucial resource for ongoing research in misinformation studies, natural language processing, and automated fact-checking, particularly for content in the Bengali language. Addressing the issue of misinformation within the under-researched Bangla language context, our study also leveraged this dataset for deep learning analysis, employing advanced techniques such as Long Short-Term Memory (LSTM) networks and Bidirectional Encoder Representations from Transformers (BERT) with a Bangla base model. The BERT model, with its robust Transformer architecture, excelled in linguistic analysis, achieving an accuracy of 98.77%, while the LSTM model, adept at handling sequential data, recorded an accuracy of 88.92%. The Bangla BERT base model demonstrated exceptional performance in precision, recall, and F1-score, marking a substantial advancement in misinformation detection for the Bangla language.