University of Bahrain
Scientific Journals

Categorisation of Computer Science Research Papers using Supervised Machine Learning Techniques

Show simple item record

dc.contributor.author Gheeseewan, Hemrajsingh
dc.contributor.author Pudaruth, Sameerchand
dc.date.accessioned 2020-07-14T20:52:19Z
dc.date.available 2020-07-14T20:52:19Z
dc.date.issued 2020-11-01
dc.identifier.issn 2210-142X
dc.identifier.uri https://journal.uob.edu.bh:443/handle/123456789/3920
dc.description.abstract In this modern era of bleeding-edge technologies, information creation, sharing and consumption are rising at an exponential rate. In the same vein, there has been a continued increase in the amount of research is are being published worldwide and a large proportion of them are in the computer science field. There is an urgent need to provide some level of order in this huge jungle of data. Thus, in this article, we have used eight supervised machine learning techniques to classify computer science research papers. Machine learning techniques, such as logistic regression, multinomial naive bayes, gaussian naive bayes, support vector machines, k-nearest neighbours, decision tree, random forest and deep learning neural networks were trained to classify research papers into appropriate categories. For this purpose, a labelled dataset of 69776 papers was downloaded from arXiv and these were classified into 35 categories. The best f1-score of 0.60 was obtained by the logistic regression classifier. It was also the fastest machine learning classifier. The best f1-score from the deep learning network was 0.59. Using only the list of references for classification produced an f1-score of 0.57, but the training and testing time was significantly less. This shows that it is possible to use only references to classify computer science research papers. The f1-score for abstracts only was 0.52. Computer science papers often do not fall into neat categories. They are often multi-topical. Thus, in the future, we intend to perform multi-label classification on the same dataset. en_US
dc.language.iso en en_US
dc.publisher University of Bahrain en_US
dc.rights Attribution-NonCommercial-NoDerivatives 4.0 International *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/4.0/ *
dc.subject Document Classification en_US
dc.subject Computer Science en_US
dc.subject Machine Learning en_US
dc.subject Logistic Regression en_US
dc.subject Deep Learning en_US
dc.title Categorisation of Computer Science Research Papers using Supervised Machine Learning Techniques en_US
dc.type Article en_US
dc.identifier.doi https://dx.doi.org/10.12785/ijcds/0906014
dc.volume 9 en_US
dc.issue 6
dc.pagestart 1165 en_US
dc.pageend 1174 en_US
dc.source.title International Journal of Computing and Digital Systems en_US
dc.abbreviatedsourcetitle IJCDS en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Issue(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International

All Journals


Advanced Search

Browse

Administrator Account