Accuracy estimation refers to the process of approximating the future performance of a classifier induced by an inducer on a given dataset. We refer the reader to kohavi-accest and weiss-kulikowski-ml for an overview.
In many cases where MLC++
presents an accuracy, it also presents the
confidence of the result. The number after the indicates the
standard deviation of the accuracy. If a single test set is
available, the standard deviation is a theoretical computation that
is reasonable for large test sets and for accuracies not too close
to zero or one. If resamplings are used (cross-validation and
bootstrap), then the standard deviation of the sample mean
is given.
An
accuracy range in square brackets is a 95% confidence bound that
is computed by a more accurate formula
[,]. An
accuracy range in parentheses is a 95% percentile interval
[]; the percentile bound is
pessimistic in the sense that it includes a wider range due to the
integral number of samples. Below 40 samples, it will give the
lowest and highest estimates, so that one can see the variability
of the estimates.
MLC++ currently supports several methods of accuracy estimation:



; the expected number of distinct instances
from the original dataset appearing in the test set is thus 0.632.
The accuracy estimate is derived by using the
bootstrap sample for training and the rest of the instances for
testing. Given a number b, the number of bootstrap samples, let
be the accuracy estimate for bootstrap sample
i. The .632 bootstrap estimate is defined as

where
is the resubstitution error estimate on the full
dataset (, the error on the training set).