A Comprehensive Dataset and Deep Learning Approach for Misinformation Detection on Social Media in Bangladesh

Rashid, Mohammad Rifat Ahmmad; Roy, Rahul; Rahman, Din M Sumon; Saleh, Musa Akram; Khan, Abdul Ali Hayder; Abu Rayhan, Md.; Ahmed, Khandaker Foysal; Monsoor, Nafees; Hasan, Mahamudul

doi:xxxxxx

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Preprint
→
View Item

A Comprehensive Dataset and Deep Learning Approach for Misinformation Detection on Social Media in Bangladesh

Rashid, Mohammad Rifat Ahmmad; Roy, Rahul; Rahman, Din M Sumon; Saleh, Musa Akram; Khan, Abdul Ali Hayder; Abu Rayhan, Md.; Ahmed, Khandaker Foysal; Monsoor, Nafees; Hasan, Mahamudul

DOI: xxxxxx

ISSN:

Date: 2024-08-24

Abstract:

In an effort to address the growing issue of misinformation on social media, particularly in the context of the Covid-19 pandemic, we have diligently developed a comprehensive dataset on Bangla misinformation. This dataset was scraped from FactWatch, a leading fact-checking organization in Bangladesh, and annotated with fact ratings. It includes a meticulously curated collection of 1014 fact-checked reports spanning from October 4, 2021, to May 25, 2023. These reports encompass a diverse array of summaries, categories, and reliable correctness labels, providing samples of the original fake news content along with investigative descriptions of the fact-checking processes employed. The dataset represents a significant contribution to Bangladesh's participation in the global effort to combat fake news and serves as a crucial resource for ongoing research in misinformation studies, natural language processing, and automated fact-checking, particularly for content in the Bengali language. Addressing the issue of misinformation within the under-researched Bangla language context, our study also leveraged this dataset for deep learning analysis, employing advanced techniques such as Long Short-Term Memory (LSTM) networks and Bidirectional Encoder Representations from Transformers (BERT) with a Bangla base model. The BERT model, with its robust Transformer architecture, excelled in linguistic analysis, achieving an accuracy of 98.77%, while the LSTM model, adept at handling sequential data, recorded an accuracy of 88.92%. The Bangla BERT base model demonstrated exceptional performance in precision, recall, and F1-score, marking a substantial advancement in misinformation detection for the Bangla language.

Show full item record