An inducer for building oblivious decision graphs bottom-up [,]. Does not handle unknown values. HOODG suffers from irrelevant or weakly relevant features, which is why you should use feature subset selection. HOODG also requires discretized data, so disc-filter must be used. The following example shows an example run with the dotty output shown in Figure 2.
setenv DATAFILE monk1 setenv DRIBBLE false setenv INDUCER disc-filter # oodg can only work on discrete data setenv DISCF_INDUCER fss # feature subset selection setenv DISCF_FSS_INDUCER hoodg setenv DISCF_FSS_MAX_STALE 3 # how much to search before stopping setenv DISCF_FSS_CV_TIMES 0 # heuristic for running cv multiple times setenv DISCF_FSS_CV_FOLDS 5 # 5-fold CV setenv DISCF_FSS_CMPLX_PENALTY 0.0001 # Small penalty for more features setenv DISCF_FSS_SHOW_REAL_ACC never # Otherwise it's a base inducer setenv REMOVE_UNKNOWN_INST yes setenv DISPLAY_STRUCT dotty # Let's see final graph setenv INDUCER_DOT oodg.dot # Where to keep the dot output InducerThe output is:
Number of training instances: 124 Number of test instances: 432. Unseen: 308, seen 124. Number correct: 432. Number incorrect: 0 Generalization accuracy: 100.00%. Memorization accuracy: 100.00% Accuracy: 100.00% +- 0.00% [99.12% - 100.00%] Note: this categorizer type does not support persistence
Figure 2: The Oblivious Decision Graph (OODG) for Monk1.
Invoking feature subset selection can be very slow. An alternative approach that is much faster is to use entropy to find a good set of attributes. The inducer ``list-hoodg'' encapsulates everything needed for the search. The option AO_GROW_CONF_RATIO, which is the proportion of misclassified instances at a given level, determines the stopping criteria. Thus a higher number means less attributes will be used. See kohavi-thesis for more information.