Outlier Handling in Clustering: A Comparative Experiment of K-Means, Robust Trimmed K-Means, and K-Means Least Trimmed Squared

Estella, Tricia; Andrita Intan Ghayatrie, Nadzla; Wibowo, Antoni

doi:http://dx.doi.org/10.12785/ijcds/XXXXXX

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Preprint
→
View Item

dc.contributor.author	Estella, Tricia
dc.contributor.author	Andrita Intan Ghayatrie, Nadzla
dc.contributor.author	Wibowo, Antoni
dc.date.accessioned	2024-03-16T13:46:27Z
dc.date.available	2024-03-16T13:46:27Z
dc.date.issued	2024-03-14
dc.identifier.issn	2210-142X
dc.identifier.uri	https://journal.uob.edu.bh:443/handle/123456789/5522
dc.description.abstract	The presence of outliers in data often leads to unsatisfactory modeling outcomes, especially when employing clustering algorithms for population segmentation and behavioral analysis. While various outlier-resilient clustering algorithms like DBSCAN, LDOF, t-SNE, and others exist, one of the most renowned algorithms, k-Means, still faces challenges in effectively handling outliers. This journal proposes an optimization of the k-Means algorithm resilient to outliers by incorporating the Least Trimmed Square technique as post-processing, referred to as k-Means LTS. The outlier trimming process occurs after the grouping process, allowing trimming within each cluster. This algorithm will be compared with ordinary k-Means and Robust Trimmed k-Means, as known as RTKM, both employing outlier trimming. The comparison of these three algorithms will consider performance metrics, clustering results, and running time. The contribution of this research lies in the enhanced optimality of k-Means LTS algorithm, outperforming the other two algorithms across all comparison parameters. By utilizing this algorithm, the presence of outliers within each cluster can be more easily explained, and the running time is notably shorter compared to RTKM. As a result, the proposed algorithm of k- Means LTS consistently proves to work better than ordinary k-Means and RTKM when implemented across ten datasets of varying types.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Bahrain	en_US
dc.subject	Clustering; Least Trimmed Squares; K-Means; Robust clustering; Noisy data; Outliers	en_US
dc.title	Outlier Handling in Clustering: A Comparative Experiment of K-Means, Robust Trimmed K-Means, and K-Means Least Trimmed Squared	en_US
dc.identifier.doi	http://dx.doi.org/10.12785/ijcds/XXXXXX
dc.volume	16	en_US
dc.issue	1	en_US
dc.pagestart	1	en_US
dc.pageend	10	en_US
dc.contributor.authorcountry	Indonesia	en_US
dc.contributor.authorcountry	Indonesia	en_US
dc.contributor.authorcountry	Indonesia	en_US
dc.contributor.authoraffiliation	Master of Information Technology, BINUS Graduate Program, BINUS University	en_US
dc.contributor.authoraffiliation	Master of Information Technology, BINUS Graduate Program, BINUS University	en_US
dc.contributor.authoraffiliation	Master of Information Technology, BINUS Graduate Program, BINUS University	en_US
dc.source.title	International Journal of Computing and Digital Systems	en_US
dc.abbreviatedsourcetitle	IJCDS	en_US