编辑: hys520855 2019-07-11

S Problems

1 and 2. In Monk 1, all three configurations of the Rulearner system not only performed with 100% accuracy but also learned the minimal concept description for the problem, whereas only the very best configurations of the other learning methods showed similar results. Monk

2 proves challenging to symbolic learning systems since the concept is not easily representable in DNF. Here we find that all the symbolic learning methods hover close to 70% in their accuracy and produce lengthy rule sets, reflective of the checkerboard distribution of classes in the instance space. These experiments show that symbolic learning methods must integrate a bias for more than just functions easily representable in DNF to fare well on a wide range of problems. Algorithm Monk

3 Algorithm Breast Can. Default Rule (majority) 71.7±7.2% CN2 (DL, chi-sq=0.0) 93.3% CN2 (DL, chi-sq=4.0) 73.3±6.0% CN2 (unord., chi-sq=0.0) 90.7% CN2 (unord., chi-sq=4.0) 71.3±4.5% CN2 (DL, chi-sq=4.0) 94.4% CN2 (DL, chi-sq=8.0) 72.1±4.0% CN2 (unord., chi-sq=4.0) 87.5% CN2 (unord., chi-sq=8.0) 72.9±5.3% C4.5 (unpruned, CF=15%) 92.6% C4.5 (unpruned, CF=15%) 67.7±9.8% C4.5 (pruned, CF=15%) 97.2% C4.5 (pruned, CF=15%) 75.2±7.6% C4.5 (unpruned, CF=25%) 92.6% C4.5 (unpruned, CF=25%) 67.7±9.8% C4.5 (pruned, CF=25%) 97.2% C4.5 (pruned, CF=25%) 73.3±4.5% Rulearner (DL, N=5%) 94.4% Rulearner (DL, N=30%) 74.9±5.8% Rulearner (unord, N=5%) 94.0% Rulearner (unord, N=30%) 74.1±6.4% Rulearner (DL, N=10%) 94.4% Rulearner (DL, N=40%) 74.8±6.6% Rulearner (unord, N=10%) 95.1% Rulearner (unord, N=40%) 73.6±7.8% Table 2. Accuracy on Monk 3. Table 3. Accuracy on Breast Cancer. We also compared Rulearner on noisy domains: Monk

3 which contains 5% noise in the training set and the real world Yugoslavian Breast Cancer data. Again we tried a range of configurations and parameters to optimize the performance of all the algorithms in our study. In Table

2 we see that all three algorithms are clearly able to learn in the presence of noise. More importantly, however, we see the importance of pruning as reflected in the results from C4.5. Since CN2 and Rulearner currently do no pruning of their induced rules, this could be a promising venue to further increase their accuracy on noisy data, especially since Rulearner outperforms unpruned C4.5. In the Breast Cancer domain we performed a three-fold cross-validation and report both the accuracy and standard deviation of the results. This is a very difficult problem as none of the algorithms tested perform significantly better than the majority default rule. Here, we again see the importance of pruning as the results with C4.5 clearly indicate. In spite of performing no pruning, the Rulearner system still performs on par with the pruned trees produced by C4.5 and seems to outperform both CN2 and unpruned C4.5. Comparisons of the decision-lists and unordered rule sets produced by CN2 and Rulearner point to no clear winner at this point.

5 Conclusions It appears that Rulearner is a viable lattice-based induction algorithm. Future work will lead us to examine how rule pruning may be employed to increase accuracy and conduct more comparative studies of the decision-list and unordered rule set paradigms. Methods for automatic noise parameter selection will also be pursued. The interested reader should also be aware of the systems CHARADE [GA, 1987] and GRAND [OO, 1988] which also make use of lattices to guide the formation of classification rules. These systems, however, differ from ours in their induction mechanisms, biases, and methods for dealing with noise. Acknowledgments The author thanks Nils Nilsson and Deon Oosthuizen for their thought provoking discussions. Additional thanks go to Oosthuizen for providing both the GRAND lattice construction program and the breast cancer data set, and to Peter Clark for providing CN2. George John and Pat Langley also provided useful insights. The author is supported by a Fr........

下载(注:源文件不在本站服务器,都将跳转到源网站下载)
备用下载
发帖评论
相关话题
发布一个新话题