next up previous contents
Next: Aha Instance-based series Up: Inducers Previous: T2

HOODG/List-HOODG: Oblivious Decision Graphs

An inducer for building oblivious decision graphs bottom-up [,]. Does not handle unknown values. HOODG suffers from irrelevant or weakly relevant features, which is why you should use feature subset selection. HOODG also requires discretized data, so disc-filter must be used. The following example shows an example run with the dotty output shown in Figure 2.

   setenv DATAFILE monk1
   setenv DRIBBLE false
   setenv INDUCER disc-filter        # oodg can only work on discrete data
   setenv DISCF_INDUCER fss          # feature subset selection
   setenv DISCF_FSS_INDUCER hoodg    
   setenv DISCF_FSS_MAX_STALE 3      # how much to search before stopping
   setenv DISCF_FSS_CV_TIMES 0       # heuristic for running cv multiple times
   setenv DISCF_FSS_CV_FOLDS 5       # 5-fold CV
   setenv DISCF_FSS_CMPLX_PENALTY 0.0001  # Small penalty for more features
   setenv DISCF_FSS_SHOW_REAL_ACC never   # Otherwise it's a base inducer
   setenv REMOVE_UNKNOWN_INST yes        
   setenv DISPLAY_STRUCT dotty            # Let's see final graph
   setenv INDUCER_DOT oodg.dot            # Where to keep the dot output
   Inducer
The output is:
   Number of training instances: 124
   Number of test instances: 432.  Unseen: 308,  seen 124.
   Number correct: 432.  Number incorrect: 0
   Generalization accuracy: 100.00%.  Memorization accuracy: 100.00%
   Accuracy: 100.00% +- 0.00% [99.12% - 100.00%]
   
   Note: this categorizer type does not support persistence

  
Figure 2: The Oblivious Decision Graph (OODG) for Monk1.

Invoking feature subset selection can be very slow. An alternative approach that is much faster is to use entropy to find a good set of attributes. The inducer ``list-hoodg'' encapsulates everything needed for the search. The option AO_GROW_CONF_RATIO, which is the proportion of misclassified instances at a given level, determines the stopping criteria. Thus a higher number means less attributes will be used. See kohavi-thesis for more information.



Ronny Kohavi
Sun Oct 6 23:17:50 PDT 1996