Identifying Duplicate Bug Records Using Word2Vec Prediction with Software Risk Analysis

Mahfoodh, Hussain; Hammad, Mustafa

doi:http://dx.doi.org/10.12785/ijcds/110162

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 11
→
Issue 01
→
View Item

dc.contributor.author	Mahfoodh, Hussain
dc.contributor.author	Hammad, Mustafa
dc.date.accessioned	2022-02-09T20:12:50Z
dc.date.available	2022-02-09T20:12:50Z
dc.date.issued	2022-02-15
dc.identifier.issn	2210-142X
dc.identifier.uri	https://journal.uob.edu.bh:443/handle/123456789/4580
dc.description.abstract	Reporting duplicated bugs in bug reports have serious productivity consequences on software projects. The fewer reporting of duplicated bugs, the better software maturity processes are set between the internal software stakeholders. Automated identification of the duplicated category through bug reports could enhance risk identification approaches during the software life cycle. In this paper, we propose two different similarity measures to identify duplicated bugs using the word-embedding (Word2Vec) natural language processing technique through Tensorflow tool. We conduct a comparison experiment on two related bug records descriptions from eight different software components from the Mozilla Core dataset. We choose different sentence types through the duplicated bug category records to compare and discuss each component’s accuracy results and identify whether the proposed module will be able to detect the related records. Using an earlier work, this paper calculates software risk values from duplication records and from bug-fix time prediction for the components that have not been identified as duplicated by the Word2Vec approach. The study results show maximum precision accuracy of 99.89% for the components that have been identified correctly as duplicated by the used approach. Additionally, we found that 66% of the software components that were excluded from the bug duplication proposed module showed an increase in software risk values.	en_US
dc.language.iso	en_US	en_US
dc.publisher	University Of Bahrain	en_US
dc.subject	Bug reports	en_US
dc.subject	duplicated bugs	en_US
dc.subject	bug-fix time	en_US
dc.subject	software risk estimation	en_US
dc.subject	bug-fix time prediction	en_US
dc.subject	software risk management	en_US
dc.subject	word embedding	en_US
dc.subject	natural language processing	en_US
dc.subject	machine learning	en_US
dc.title	Identifying Duplicate Bug Records Using Word2Vec Prediction with Software Risk Analysis	en_US
dc.identifier.doi	http://dx.doi.org/10.12785/ijcds/110162
dc.volume	11	en_US
dc.issue	1	en_US
dc.pagestart	763	en_US
dc.pageend	773	en_US
dc.contributor.authorcountry	Bahrain	en_US
dc.contributor.authoraffiliation	Department of Computer Science, University of Bahrain	en_US
dc.source.title	International Journal of Computing and Digital Systems	en_US
dc.abbreviatedsourcetitle	IJCDS	en_US