Categorisation of Computer Science Research Papers using Supervised Machine Learning Techniques

Gheeseewan, Hemrajsingh; Pudaruth, Sameerchand

doi:https://dx.doi.org/10.12785/ijcds/0906014

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 09
→
Issue 06
→
View Item

dc.contributor.author	Gheeseewan, Hemrajsingh
dc.contributor.author	Pudaruth, Sameerchand
dc.date.accessioned	2020-07-14T20:52:19Z
dc.date.available	2020-07-14T20:52:19Z
dc.date.issued	2020-11-01
dc.identifier.issn	2210-142X
dc.identifier.uri	https://journal.uob.edu.bh:443/handle/123456789/3920
dc.description.abstract	In this modern era of bleeding-edge technologies, information creation, sharing and consumption are rising at an exponential rate. In the same vein, there has been a continued increase in the amount of research is are being published worldwide and a large proportion of them are in the computer science field. There is an urgent need to provide some level of order in this huge jungle of data. Thus, in this article, we have used eight supervised machine learning techniques to classify computer science research papers. Machine learning techniques, such as logistic regression, multinomial naive bayes, gaussian naive bayes, support vector machines, k-nearest neighbours, decision tree, random forest and deep learning neural networks were trained to classify research papers into appropriate categories. For this purpose, a labelled dataset of 69776 papers was downloaded from arXiv and these were classified into 35 categories. The best f1-score of 0.60 was obtained by the logistic regression classifier. It was also the fastest machine learning classifier. The best f1-score from the deep learning network was 0.59. Using only the list of references for classification produced an f1-score of 0.57, but the training and testing time was significantly less. This shows that it is possible to use only references to classify computer science research papers. The f1-score for abstracts only was 0.52. Computer science papers often do not fall into neat categories. They are often multi-topical. Thus, in the future, we intend to perform multi-label classification on the same dataset.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Bahrain	en_US
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Document Classification	en_US
dc.subject	Computer Science	en_US
dc.subject	Machine Learning	en_US
dc.subject	Logistic Regression	en_US
dc.subject	Deep Learning	en_US
dc.title	Categorisation of Computer Science Research Papers using Supervised Machine Learning Techniques	en_US
dc.type	Article	en_US
dc.identifier.doi	https://dx.doi.org/10.12785/ijcds/0906014
dc.volume	9	en_US
dc.issue	6
dc.pagestart	1165	en_US
dc.pageend	1174	en_US
dc.source.title	International Journal of Computing and Digital Systems	en_US
dc.abbreviatedsourcetitle	IJCDS	en_US