Deep Learning-based Analysis of Algerian Dialect Dataset Targeted Hate Speech, Offensive Language and Cyberbullying

Mazari, Ahmed Cherif; Kheddar, Hamza

doi:http://dx.doi.org/10.12785/ijcds/130177

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 13
→
Issue 01
→
View Item

dc.contributor.author	Mazari, Ahmed Cherif
dc.contributor.author	Kheddar, Hamza
dc.date.accessioned	2023-03-02T10:46:27Z
dc.date.available	2023-03-02T10:46:27Z
dc.date.issued	2023-03-02
dc.identifier.issn	2210-142X
dc.identifier.uri	https://journal.uob.edu.bh:443/handle/123456789/4783
dc.description.abstract	Toxicity and hate speech on social media platforms can lead to cyber-crime, affecting social life on a personal and community level. Therefore, automatic toxicity and hateful content detection are necessary to enhance web content quality and fight against inappropriate speech spread through social media. This need is also a challenge when comments are posted and written in complex languages, such as Arabic, which is recognised for its difficulties and lack of resources. This paper introduces a new dataset for Algerian dialect toxic text detection, whereby we build an annotated multi-label dataset consisting of 14150 comments extracted from Facebook, YouTube and Twitter, and labelled as hate speech, offensive language and cyberbullying. To assess the practical utility of the created annotated dataset, several tests have been conducted using many classification models of traditional machine learning (ML), namely, Random Forest, Na¨ıve Bayes, Linear Support Vector (SVC), Stochastic Gradient Descent (SGD) and Logistic Regression. Furthermore, several assessments have been conducted using Deep Learning (DL) models such as Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional-LSTM (Bi-LSTM) and Bidirectional-GRU (Bi-GRU). Experimental tests demonstrate the success of the Bi-GRU model, which achieved the highest results for DL classification, with 73.6% Accuracy and 75.8% F1-Score.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Bahrain	en_US
dc.subject	Machine Learning, Deep Learning, Algerian Dialect, Cyberbullying Detection, Offensive Language Detection, Hate Speech Detection	en_US
dc.title	Deep Learning-based Analysis of Algerian Dialect Dataset Targeted Hate Speech, Offensive Language and Cyberbullying	en_US
dc.type	Article	en_US
dc.identifier.doi	http://dx.doi.org/10.12785/ijcds/130177	en
dc.contributor.authoraffiliation	Mathematics and Computer Science Department, LSEA Laboratory, University of M´ed´ea, Algeria	en_US
dc.contributor.authoraffiliation	Electrical Engineering Department, LSEA Laboratory, University of M´ed´ea, Algeria	en_US
dc.source.title	International Journal of Computing and Digital Systems	en_US
dc.abbreviatedsourcetitle	IJCDS	en_US