  The discretize utility provides discretization ability. See Section 5.1 for a description of available options.

This utility indeed only looks at the training set to form the intervals, and these intervals are used to discretize both the training set and the test set. It is a mistake to discretize all the data and then to run cross-validation, because the discretization intervals will then be chosen based on the internal folds that serve as test sets. The MLC++ disc-filter will do the right thing if used within cross-validation, , for each of the cross-validation folds, different intervals will be formed as if these were training and test sets.

Ronny Kohavi
Sun Oct 6 23:17:50 PDT 1996