Next: Inducers Up: No Title Previous: Common Options

# Accuracy Estimation

Accuracy estimation refers to the process of approximating the future performance of a classifier induced by an inducer on a given dataset. We refer the reader to kohavi-accest and weiss-kulikowski-ml for an overview.

In many cases where MLC++ presents an accuracy, it also presents the confidence of the result. The number after the indicates the standard deviation of the accuracy. If a single test set is available, the standard deviation is a theoretical computation that is reasonable for large test sets and for accuracies not too close to zero or one. If resamplings are used (cross-validation and bootstrap), then the standard deviation of the sample mean is given. An accuracy range in square brackets is a 95% confidence bound that is computed by a more accurate formula [,]. An accuracy range in parentheses is a 95% percentile interval []; the percentile bound is pessimistic in the sense that it includes a wider range due to the integral number of samples. Below 40 samples, it will give the lowest and highest estimates, so that one can see the variability of the estimates.

MLC++ currently supports several methods of accuracy estimation:

Holdout
The dataset is split into two disjoint sets of instances. The inducer is trained on one set, the training set, and tested on the disjoint set, the test set. The accuracy on the test is the estimated accuracy.

Cross-validation
In k-fold cross-validation, the dataset is randomly split into k mutually exclusive subsets (the folds) of approximately equal size. The inducer is trained and tested k times; each time tested on a fold and trained on the dataset minus the fold. The cross-validation estimate of accuracy is the average of the estimated accuracies from the k folds.

Stratified cross-validation
Same as cross-validation, except that the folds are stratified so that they contain approximately the same proportions of labels as the original dataset.

Bootstrap
The .632 Bootstrap [] estimates the accuracy as follows. Given a dataset of size n, a bootstrap sample is created by sampling n instances uniformly from the data (with replacement). Since the dataset is sampled with replacement, the probability of any given instance not being chosen after n samples is ; the expected number of distinct instances from the original dataset appearing in the test set is thus 0.632. The accuracy estimate is derived by using the bootstrap sample for training and the rest of the instances for testing. Given a number b, the number of bootstrap samples, let be the accuracy estimate for bootstrap sample i. The .632 bootstrap estimate is defined as

where is the resubstitution error estimate on the full dataset (, the error on the training set).

Next: Inducers Up: No Title Previous: Common Options

Ronny Kohavi
Sun Oct 6 23:17:50 PDT 1996