dc.contributor.author |
Gerguis, Michel Naim |
|
dc.contributor.author |
El-Kharashi, M. Watheq |
|
dc.contributor.author |
Salama, Cherif |
|
dc.date.accessioned |
2022-02-05T14:39:33Z |
|
dc.date.available |
2022-02-05T14:39:33Z |
|
dc.date.issued |
2022-02-15 |
|
dc.identifier.issn |
2210-142X |
|
dc.identifier.uri |
https://journal.uob.edu.bh:443/handle/123456789/4568 |
|
dc.description.abstract |
This paper introduces ClassifyWiki, a framework that automatically generates Wikipedia-based text classifiers using a small set of positive training articles. ClassifyWiki aims to simplify the process of collecting hundreds or thousands of Wikipedia pages with the same entity class, using a set of positive articles with sizes possibly as small as 10 pages. The customer could define the small initial set at any different level of granularity (i.e. people, sports people, or even footballers). ClassifyWiki leveraged many previous efforts in Wikipedia entity classification in order to build a generic framework that works for any entity at any level of granularity with few examples. The framework does not only offer a set of pre-built models for many entity classes but a tool tuned through hundreds of experiments to generate models for any given set of articles. To test the framework, we manually tagged a data set of 2500 Wikipedia pages. This data set covers 808 unique entity classes on different levels of granularity. ClassifyWiki was tested over 103 different entity
classes varying in size down to only 5 positive articles. On our blind set, ClassifyWiki achieved a macro-averaged f1-score of 83% with 96% precision and 74% recall using 50 or more positive articles. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
University of Bahrain |
en_US |
dc.subject |
ClassifyWiki, Entity Classification, Fine-Grained Entity Classification, Text Classification, Wikipedia Classification |
en_US |
dc.title |
ClassifyWiki: An Experimental Study on Building Generic-Type Wikipedia Classifiers |
en_US |
dc.identifier.doi |
https://dx.doi.org/10.12785/ijcds/110161 |
|
dc.volume |
11 |
en_US |
dc.issue |
1 |
en_US |
dc.pagestart |
753 |
en_US |
dc.pageend |
762 |
en_US |
dc.contributor.authoraffiliation |
Microsoft, Cairo, Egypt |
en_US |
dc.contributor.authoraffiliation |
Department of Computer and Systems Engineering, Faculty of Engineering, Ain Shams University, Cairo, Egypt |
en_US |
dc.contributor.authoraffiliation |
The American University in Cairo, Cairo, Egypt. Ain Shams University, Cairo, Egypt |
en_US |
dc.source.title |
International Journal of Computing and Digital Systems |
en_US |
dc.abbreviatedsourcetitle |
IJCDS |
en_US |