Intelligent Optimized Mining Pattern for Genomic Database Via Clustered Indexing Method
Author(s)
S.ThabasuKannan
Published Date
September 10, 2024
DOI
your-doi-here
Volume / Issue
Vol. 1 / Issue 4
Abstract
Genome datasets have been growing exponentially in the past few years. The GenBank database, for example, will be doubled every 15 months. With this rapid growth, genome datasets and the associated access structures have become too larger to fit in the main memory of a computer. It leads to a large number of disk accesses. Therefore, slow response times occurred for homology searches and other queries. In the circumstances, we should take all possible efforts to develop proper tools to access the data and mine them efficiently. Otherwise, Data mine will be wasted that resulted in prolonging of search time and lack of efficiency. This paper proposes a new architecture for fast searching of genomic databases efficiently and effectively. In this paper, projected clustering algorithm is used as a first and foremost component of proposed model for effective clustering. Because it supports heavily to cluster high dimensional data. However, most of the existing projected clustering algorithms depends always on some critical user parameters in determining the relevant attributes of each cluster. In case wrong parameter values are used, the clustering performance will be seriously degraded. It is unfortunate that correct parameter values are rarely known in real datasets. It does not depend on user inputs in determining relevant attributes. However, thresholds dynamically. This algorithm displays a much higher usability than the other projected clustering algorithms and also works well with a gene expression dataset. For the second component of proposed model, a new metric index, called M+-Tree is proposed and organized for large datasets in metric spaces. Because M+-tree takes a new concept called key dimension, which effectively reduces response time for similarity search. The main idea behind is to make the fan-out of tree larger by partitioning a subspace further into two subspaces, called twin-nodes. By utilizing the twin-nodes, the filtering effectiveness can be doubled. In addition, for ensuring high space utilization, data will be reallocated dynamically between the twin nodes.
View Full Article
Download or view the complete article PDF published by the author.