next up previous contents
Next: C45Tree Up: Utilities Previous: Categorize

LearnCurve

  
Figure 5: The learning curve for C4.5 on soybean-large

The LearnCurve utility generates a learning curve for a given induction algorithm and a dataset. Given a dataset, the x-axis represents the number of training instances and the y-axis represents the accuracy when trained on the given number of instances and tested on the unseen instances.

[Learning Curve]

To generate a learning curve for the performance of C4.5 on the soybean-large dataset, one can do: {

   setenv INDUCER  C4.5
   setenv DATAFILE soybean-large.all    # This contains the full dataset
   setenv NUM_INTERVALS 20              # number of intervals on X-axis
   setenv NUM_REPEATS   10              # number of runs at each point
   setenv MIN_TEST_SIZE 300             # leave at least 300 for testing
   setenv DUMPSTEM                      # no dump stem
   setenv LC_OUTPUT_TYPE gnuplot
   LearnCurve
   gnuplot soybean-large.gnuplot
The output is:
   Inducer: c4.5.  Intervals: 20, Repeats: 10, Min test size: 300.  Seed: 7258789
   DATAFILE: soybean-large.all (size=683)
   Size,  Acc, std-dev of mean
       20, 32.55% +- 2.79%
       40, 50.58% +- 2.48%
       60, 58.59% +- 1.58%
       81, 69.97% +- 1.75%
      101, 68.49% +- 1.26%
      121, 75.09% +- 0.88%
      141, 77.80% +- 1.58%
      161, 80.38% +- 1.94%
      181, 80.96% +- 0.91%
      202, 81.73% +- 0.83%
      222, 83.49% +- 0.87%
      242, 83.63% +- 0.50%
      262, 86.56% +- 0.76%
      282, 86.26% +- 1.19%
      302, 85.28% +- 1.37%
      323, 87.69% +- 0.58%
      343, 87.12% +- 0.82%
      363, 88.63% +- 0.95%
      383, 88.63% +- 1.00%
   Gnuplot output in soybean-large.gnuplot
}

Figure 5 shows the gnuplot graph generated by LearnCurve.



Ronny Kohavi
Sun Oct 6 23:17:50 PDT 1996