next up previous contents
Next: Holte's OneR Up: Inducers Previous: Instance Based Algorithms

C4.5 Variants

A description of C4.5 is given in quinlan-c45. The C4.5, C4.5-no-pruning, and C4.5-rules are interfaces to C4.5, which you need to install on your own. They requires ``c4.5'' and ``c4.5rules'' to be in the path, and use the file $MLCDIR/c45test.awk. C4.5 can be purchased with the C4.5 book by Ross Quinlan (ISBN: 1-55860-240). Patches can be retrieved by anonymous ftp to, directory pub/ml, file patch.tar.Z.

You can modify the default behavior (options) for C4.5 by setting the C45_FLAGS (default is -u -f %s). The %s will be replaced by the file name stem as required by C4.5. C4.5-rules run C4.5 then C4.5rules and the appropriate options can be set using C45R_FLAGS1 for C4.5 and C45R_FLAGS2 for C4.5rules. Unless you use version 7 of C4.5rules, the return status is wrong, which is why we've added an ``echo'' dummy statement to C45R_FLAGS2.

C45_STATS allows you to generate statistics about the generated trees, including the number of nodes and the number of attributes. It is mostly useful if you want to see these statistics for 10-fold CV as opposed to a single run.

MAX_C45_TRIES determines the number of times to call C4.5 in case there is a problem running C4.5 or parsing its output. The default value of one suffices unless a cleaning job removes files from /tmp and may remove interface files. For example, in kohavi-accest some runs made hundred of thousands of calls to C4.5 to compute a single number. It was crucial to make sure that even if the interface files were removed, a smooth recovery would occur by regenerating the files.

Ronny Kohavi
Sun Oct 6 23:17:50 PDT 1996