next up previous contents
Next: Bagging Up: Wrapper Inducers Previous: Wrapper Inducers

Discretization filter


Disc-filter is a wrapper inducer that takes the wrapped inducer in the DISCF_INDUCER option. Options for the wrapped inducer will be prefixed by the ``DISCF_'' prefix.

The most important option is the discretization type: entropy, 1r, bin, c4.5-disc, t2-disc. The entropy discretization seems to be the best discretization method from the allowed options for most practical datasets []. Methods which require specifying the number of intervals need the option DISC_NUM_INTR, which determines the number of intervals. Possible options are: Algo-heuristic (algorithm dependent heuristic), Fixed-value (you specify the number), and MDL (based on fayyad-irani-disc). The entropy method requires MIN_SPLIT, the minimum number of instances in an interval, and the algorithm heuristic defaults to MDL. The 1r method is Holte's method of discretization used in the OneR rule []. It requires MIN_INST, the minimum number of instances per bin (0 will be changed to 6, the default suggested by Holte). The bin method uses uniform binning (equal intervals) and the algorithm heuristic for the number of bins is to use twice the log (base 2) of the number of distinct values, a heuristic used in Splus [] and compared in dougherty-kohavi-sahami-disc. The T2 algorithm [,] heuristic is to form number of classes plus one bins.

Ronny Kohavi
Sun Oct 6 23:17:50 PDT 1996