Figure 5: The learning curve for C4.5 on soybean-large
The LearnCurve utility generates a learning curve for a given induction algorithm and a dataset. Given a dataset, the x-axis represents the number of training instances and the y-axis represents the accuracy when trained on the given number of instances and tested on the unseen instances.

[Learning Curve]
To generate a learning curve for the performance of C4.5 on the soybean-large dataset, one can do: {
setenv INDUCER C4.5 setenv DATAFILE soybean-large.all # This contains the full dataset setenv NUM_INTERVALS 20 # number of intervals on X-axis setenv NUM_REPEATS 10 # number of runs at each point setenv MIN_TEST_SIZE 300 # leave at least 300 for testing setenv DUMPSTEM # no dump stem setenv LC_OUTPUT_TYPE gnuplot LearnCurve gnuplot soybean-large.gnuplotThe output is:
Inducer: c4.5. Intervals: 20, Repeats: 10, Min test size: 300. Seed: 7258789
DATAFILE: soybean-large.all (size=683)
Size, Acc, std-dev of mean
20, 32.55% +- 2.79%
40, 50.58% +- 2.48%
60, 58.59% +- 1.58%
81, 69.97% +- 1.75%
101, 68.49% +- 1.26%
121, 75.09% +- 0.88%
141, 77.80% +- 1.58%
161, 80.38% +- 1.94%
181, 80.96% +- 0.91%
202, 81.73% +- 0.83%
222, 83.49% +- 0.87%
242, 83.63% +- 0.50%
262, 86.56% +- 0.76%
282, 86.26% +- 1.19%
302, 85.28% +- 1.37%
323, 87.69% +- 0.58%
343, 87.12% +- 0.82%
363, 88.63% +- 0.95%
383, 88.63% +- 1.00%
Gnuplot output in soybean-large.gnuplot
}
Figure 5 shows the gnuplot graph generated by LearnCurve.