Accuracy estimation refers to the process of approximating the future performance of a classifier induced by an inducer on a given dataset. We refer the reader to kohavi-accest and weiss-kulikowski-ml for an overview.

In many cases where MLC++
presents an accuracy, it also presents the
confidence of the result. The number after the indicates the
standard deviation of the accuracy. If a single test set is
available, the standard deviation is a theoretical computation that
is reasonable for large test sets and for accuracies not too close
to zero or one. If resamplings are used (cross-validation and
bootstrap), then the standard deviation of the sample **mean**
is given. An
accuracy range in square brackets is a 95% confidence bound that
is computed by a more accurate formula
[,]. An
accuracy range in parentheses is a 95% percentile interval
[]; the percentile bound is
pessimistic in the sense that it includes a wider range due to the
integral number of samples. Below 40 samples, it will give the
lowest and highest estimates, so that one can see the variability
of the estimates.

MLC++ currently supports several methods of accuracy estimation:

- Holdout
- The dataset is split into two disjoint sets of
instances. The inducer is trained on one set, the training set,
and tested on the disjoint set, the test set. The accuracy on
the test is the estimated accuracy.
- Cross-validation
- In
**k**-fold cross-validation, the dataset is randomly split into**k**mutually exclusive subsets (the folds) of approximately equal size. The inducer is trained and tested**k**times; each time tested on a fold and trained on the dataset minus the fold. The cross-validation estimate of accuracy is the average of the estimated accuracies from the**k**folds. - Stratified cross-validation
- Same as cross-validation, except
that the folds are stratified so that they contain approximately the
same proportions of labels as the original dataset.
- Bootstrap
- The .632 Bootstrap []
estimates the accuracy as follows. Given a dataset of size
**n**, a**bootstrap sample**is created by sampling**n**instances uniformly from the data (with replacement). Since the dataset is sampled with replacement, the probability of any given instance not being chosen after**n**samples is ; the expected number of distinct instances from the original dataset appearing in the test set is thus 0.632. The accuracy estimate is derived by using the bootstrap sample for training and the rest of the instances for testing. Given a number**b**, the number of bootstrap samples, let be the accuracy estimate for bootstrap sample**i**. The .632 bootstrap estimate is defined aswhere is the resubstitution error estimate on the full dataset (, the error on the training set).

Sun Oct 6 23:17:50 PDT 1996