ClassifyWiki: An Experimental Study on Building Generic-Type Wikipedia Classifiers

Gerguis, Michel Naim; El-Kharashi, M. Watheq; Salama, Cherif

doi:https://dx.doi.org/10.12785/ijcds/110161

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 11
→
Issue 01
→
View Item

ClassifyWiki: An Experimental Study on Building Generic-Type Wikipedia Classifiers

Gerguis, Michel Naim; El-Kharashi, M. Watheq; Salama, Cherif

DOI: https://dx.doi.org/10.12785/ijcds/110161

ISSN: 2210-142X

Date: 2022-02-15

Abstract:

This paper introduces ClassifyWiki, a framework that automatically generates Wikipedia-based text classifiers using a small set of positive training articles. ClassifyWiki aims to simplify the process of collecting hundreds or thousands of Wikipedia pages with the same entity class, using a set of positive articles with sizes possibly as small as 10 pages. The customer could define the small initial set at any different level of granularity (i.e. people, sports people, or even footballers). ClassifyWiki leveraged many previous efforts in Wikipedia entity classification in order to build a generic framework that works for any entity at any level of granularity with few examples. The framework does not only offer a set of pre-built models for many entity classes but a tool tuned through hundreds of experiments to generate models for any given set of articles. To test the framework, we manually tagged a data set of 2500 Wikipedia pages. This data set covers 808 unique entity classes on different levels of granularity. ClassifyWiki was tested over 103 different entity classes varying in size down to only 5 positive articles. On our blind set, ClassifyWiki achieved a macro-averaged f1-score of 83% with 96% precision and 74% recall using 50 or more positive articles.

Show full item record