الفهرس | Only 14 pages are availabe for public view |
Abstract With the discovery of new DNAs, a fundamental problem arising is how to categorize those DNA sequences into correct species. Unfortunately, identifying all data groups correctly and assigning a set of DNAs into k clusters where k must be predefined are one of the major drawbacks in clustering analysis, especially when the data have many dimensions, and the number of clusters is too large and hard to guess. Furthermore, finding a similarity measure that preserves the functionality and represents both the composition and distribution of the bases in a DNA sequence is one of the main challenges in computational biology. In this thesis, a new soft computing metaheuristic framework is introduced for automatic clustering to generate the optimal cluster formation and to determine the best estimate for the number of clusters. Pulse coupled neural network (PCNN) is utilized for the calculation of DNA sequence similarity or dissimilarity. Bat algorithm is hybridized with the well-known genetic algorithm to solve the automatic data clustering problem. Extensive computational experiments are conducted on the expanded human oral microbiome database (eHOMD). The simulation results showed that the hybrid GABAT outperformed the two state of-the-art clustering algorithms genetic algorithm , bat algorithm and other competing metaheuristic algorithms. GABAT showed better mean and standard deviation values achieving 0.40954 , 0.0197 using Euclidean distance and 0.012312 , 0.003918 using entropy as a distance measure , respectively. Wilcoxon test is conducted to statically validate the obtained clusters, and it showed a significant p-value of less than 5% where the bat algorithm outperformed the genetic algorithm, and the GABAT outperformed the bat algorithm . This proves that GABAT performed better than its competitors |