next up previous contents
Next: ID3MC4 Up: Inducers Previous: Const

Naive Bayes

The Naive-Bayes inducer [,,] computes conditional probabilities of the classes given the instance and picks the class with the highest posterior. Attributes are assumed to be independent, an assumption that is unlikely to be true, but the algorithm is nonetheless very robust to violations of this assumption.

The probabilities for nominal (discrete) attributes are estimated by counts. The probability for zero counts is for m instances. The probabilities for continuous attributes are estimated by assuming a normal distribution for each attribute and class. Unknown values in the test instance are skipped (equivalent to marginalizing over them).

Better results are commonly achieved by discretizing the continuous attributes. The disc-naive-bayes inducer provides this preprocessing step by chaining disc-filter-inducer to naive-bayes inducer []. Further improvements can usually be achieved by running feature subset selection [,] as shown below:

   setenv INDUCER disc-filter
   setenv DISCF_INDUCER FSS
   setenv DISCF_FSS_INDUCER naive
   setenv DISCF_FSS_CMPLX_PENALTY 0.001
   setenv DISCF_FSS_CV_TIMES 0
   setenv DISCF_FSS_ACC_ESTIMATOR cv
   setenv DISCF_FSS_CV_FOLDS 5
   setenv DISCF_FSS_DIRECTION backward



Ronny Kohavi
Sun Oct 6 23:17:50 PDT 1996