Abstract:
The presence of outliers in data often leads to unsatisfactory modeling outcomes, especially when employing clustering
algorithms for population segmentation and behavioral analysis. While various outlier-resilient clustering algorithms like DBSCAN,
LDOF, t-SNE, and others exist, one of the most renowned algorithms, k-Means, still faces challenges in effectively handling outliers.
This journal proposes an optimization of the k-Means algorithm resilient to outliers by incorporating the Least Trimmed Square
technique as post-processing, referred to as k-Means LTS. The outlier trimming process occurs after the grouping process, allowing
trimming within each cluster. This algorithm will be compared with ordinary k-Means and Robust Trimmed k-Means, as known as
RTKM, both employing outlier trimming. The comparison of these three algorithms will consider performance metrics, clustering
results, and running time. The contribution of this research lies in the enhanced optimality of k-Means LTS algorithm, outperforming
the other two algorithms across all comparison parameters. By utilizing this algorithm, the presence of outliers within each cluster
can be more easily explained, and the running time is notably shorter compared to RTKM. As a result, the proposed algorithm of k-
Means LTS consistently proves to work better than ordinary k-Means and RTKM when implemented across ten datasets
of varying types.