Data mining is the extraction of hidden predictive information from large database. Classification is a data mining task that takes a large collection of examples from multiple groups as inputs and identifies the characteristics patterns or property for each group. One common approach to classification is to use decision tree. Decision tree classification method has emerged as the essential knowledge acquisition procedure which follows the machine learning strategy, 'learning from examples'. In this paper we perform a comparative study of the performance of the decision tree classification algorithms C4.5 and C5. C4.5 algorithm constructs the decision tree with a 'divide and conquer' strategy. C5 algorithm uses the concept of gain to produce a classifier in the form of decision tree according to the previously chosen classification. These two algorithms are applied to the large data set 'adult', obtained from the UCI Machine Learning Repository, which is used to predict the individual's income. The result indicates that C5 algorithm has a higher performance rate when compared with C4.5 algorithm.
View Full Article
Download or view the complete article PDF published by the author.