next up previous contents
Next: Decision Tables Up: Inducers Previous: Naive Bayes

ID3, MC4

ID3 is a very basic decision tree algorithm with no pruning. MC4 includes pruning similar to C4.5 []. Except for unknown handling, which is different, MC4 should give you similar results to those of C4.5. Underneath, both are the same algorithm with different default parameter settings.

The MIN_SPLIT_WEIGHT is the minimum percent of training instances divided by the number of classes that are required to trickle down to at least two branches in a given node. The LBOUND_MIN_SPLIT and UBOUND_MIN_SPLIT bound this number from below and above. This provides a similar mechanism to C4.5's handling of splits. The determination of the bound is as follows. First, the minimum number of instances is computing using WEIGHT (this is computed as a floating point number), then if this number is higher than the UBOUND it becomes UBOUND. Finally, if this number if lower than LBOUND, it becomes LBOUND.

ID3_DEBUG adds information to each node, indicating the number of instances, entropy, and mutual information. Set DISPLAY_STRUCT to ``dot'' and view the resulting Inducer.dot file in dot/dotty after running the Inducer utility, or simply use the ID3 utility.

ID3_UNKNOWN_EDGES determines whether an edge is generated to handle unknown values. If this option is FALSE, you will get a nicer looking tree, but it will fail if there are instances with unknown values. Note that C4.5 has a better mechanism of handling unknowns.

ID3_SPLIT_BY determines the splitting criterion. Either regular mutual-information is used, or mutual-information normalized by the number of values is used.



Ronny Kohavi
Sun Oct 6 23:17:50 PDT 1996