Abstract:
Several clustering techniques have been developed to help researchers analyze the large amount of information derived
from genomic data. These techniques have led to the discovery of new expression patterns under different experimental conditions.
One of the objectives of these methods is to cluster the profiles of co-expressed genes. However, the grouping of genes requires
optimization and consistency with the reality of the biological data. This paper addresses these two aspects using the Bisect ing
KMeans (BKM) algorithm optimized with the WB validity index. For each cluster obtained at the end of the execution of the BKM
algorithm, a profile representing this cluster that will be named leader is determined by the Leader Clustering algorithm. Then, the
semantic computing of the Gene Ontology terms by the GOGO measurement is combined with the results of the optimized
clustering. The proposed approach, called OBKML-GO (Optimized Bisecting KMeans Leader with Gene Ontology), is carried out
on three benchmarks of model organisms: Yeast, Human and the plant Arabidopsis thaliana. The results show that this approach
produces more relevant and coherent groups of co-expressed genes, reflecting at the same time the biological reality.