University of Bahrain
Scientific Journals

ClassifyWiki: An Experimental Study on Building Generic-Type Wikipedia Classifiers

Show simple item record

dc.contributor.author Gerguis, Michel Naim
dc.contributor.author El-Kharashi, M. Watheq
dc.contributor.author Salama, Cherif
dc.date.accessioned 2022-02-05T14:39:33Z
dc.date.available 2022-02-05T14:39:33Z
dc.date.issued 2022-02-15
dc.identifier.issn 2210-142X
dc.identifier.uri https://journal.uob.edu.bh:443/handle/123456789/4568
dc.description.abstract This paper introduces ClassifyWiki, a framework that automatically generates Wikipedia-based text classifiers using a small set of positive training articles. ClassifyWiki aims to simplify the process of collecting hundreds or thousands of Wikipedia pages with the same entity class, using a set of positive articles with sizes possibly as small as 10 pages. The customer could define the small initial set at any different level of granularity (i.e. people, sports people, or even footballers). ClassifyWiki leveraged many previous efforts in Wikipedia entity classification in order to build a generic framework that works for any entity at any level of granularity with few examples. The framework does not only offer a set of pre-built models for many entity classes but a tool tuned through hundreds of experiments to generate models for any given set of articles. To test the framework, we manually tagged a data set of 2500 Wikipedia pages. This data set covers 808 unique entity classes on different levels of granularity. ClassifyWiki was tested over 103 different entity classes varying in size down to only 5 positive articles. On our blind set, ClassifyWiki achieved a macro-averaged f1-score of 83% with 96% precision and 74% recall using 50 or more positive articles. en_US
dc.language.iso en en_US
dc.publisher University of Bahrain en_US
dc.subject ClassifyWiki, Entity Classification, Fine-Grained Entity Classification, Text Classification, Wikipedia Classification en_US
dc.title ClassifyWiki: An Experimental Study on Building Generic-Type Wikipedia Classifiers en_US
dc.identifier.doi https://dx.doi.org/10.12785/ijcds/110161
dc.volume 11 en_US
dc.issue 1 en_US
dc.pagestart 753 en_US
dc.pageend 762 en_US
dc.contributor.authoraffiliation Microsoft, Cairo, Egypt en_US
dc.contributor.authoraffiliation Department of Computer and Systems Engineering, Faculty of Engineering, Ain Shams University, Cairo, Egypt en_US
dc.contributor.authoraffiliation The American University in Cairo, Cairo, Egypt. Ain Shams University, Cairo, Egypt en_US
dc.source.title International Journal of Computing and Digital Systems en_US
dc.abbreviatedsourcetitle IJCDS en_US


Files in this item

This item appears in the following Issue(s)

Show simple item record

All Journals


Advanced Search

Browse

Administrator Account