This utility indeed only looks at the training set to form the intervals, and these intervals are used to discretize both the training set and the test set. It is a mistake to discretize all the data and then to run cross-validation, because the discretization intervals will then be chosen based on the internal folds that serve as test sets. The MLC++ disc-filter will do the right thing if used within cross-validation, , for each of the cross-validation folds, different intervals will be formed as if these were training and test sets.