 .names file created by George John, October 1994
 This data is ALMOST the same as the original crx dataset Quinlan
 used in C4.5, but
 * missing values have been replaced with the medians,
 which is unfair to the algorithms that can deal
 with missing data well. Replacing an attribute by its mean/median
 value is known to be one of the poorest methods of handling missing values.
 * attribute 4 is removed (I checked  in the entire dataset atts 4 and 5
 were completely correlated)
 * categorical attribute values are numbered in increasing likelihood
 of being class + and treated as numeric in the statlog tests.
 Strange.

1. TITLE:
 Australian Credit Approval

2. USE IN STATLOG

 2.1 Testing Mode
 10Fold Cross Validation

 2.2 Special Preprocessing
 Yes (See REMARKS)

 2.3 Test Results

 Algorithm Success Rate
  
 Cal5 86.900
 Itrule 86.300
 LogDisc 85.900
 Discrim 85.900
 Dipol92 85.900
 Radial 85.500
 Cart 85.500
 Castle 85.200
 Bayes 84.900
 IndCart 84.800
 BackProp 84.600
 C4.5 84.500
 Smart 84.200
 BayTree 82.900
 KNN 81.900
 Ac2 81.900
 NewId 81.900
 LVQ 80.300
 Alloc80 79.900
 Cn2 79.600
 QuaDisc 79.300
 Default 56.000
 Cascade 0.000
 Kohonen 0.000

3. SOURCES and PAST USAGE

 3.1 ORIGINAL SOURCE
 (confidential)
 Submitted by quinlan@cs.su.oz.au

 3.2 PAST USAGE
 See Quinlan,
 * "Simplifying decision trees", Int J ManMachine Studies 27,
 Dec 1987, pp. 221234.
 * "C4.5: Programs for Machine Learning", Morgan Kaufmann, Oct 1992

 3.2. RELEVANT INFORMATION

 This file concerns credit card applications. All attribute names
 and values have been changed to meaningless symbols to protect
 confidentiality of the data.

 This dataset is interesting because there is a good mix of
 attributes  continuous, nominal with small numbers of
 values, and nominal with larger numbers of values. There
 were originally a few missing values, but these have all
 been replaced by the overall median.

4. DATASET DESCRIPTION

 NUMBER OF EXAMPLES
 Total no. = 690
 NUMBER OF CLASSES: 2
 0,1 (,+)

 Class Distribution:
 +: 307 (44.5%) CLASS 1
 : 383 (55.5%) CLASS 0

 NUMBER OF ATTRIBUTES
 14 (6 Continuous 8 Categorical)

 A1: 0,1 CATEGORICAL
 a,b
 A2: continuous.
 A3: continuous.
 A4: 1,2,3 CATEGORICAL
 p,g,gg
 A5: 1, 2,3,4,5, 6,7,8,9,10,11,12,13,14 CATEGORICAL
 ff,d,i,k,j,aa,m,c,w, e, q, r,cc, x

 A6: 1, 2,3, 4,5,6,7,8,9 CATEGORICAL
 ff,dd,j,bb,v,n,o,h,z

 A7: continuous.
 A8: 1, 0 CATEGORICAL
 t, f.
 A9: 1, 0 CATEGORICAL
 t, f.
 A10: continuous.
 A11: 1, 0 CATEGORICAL
 t, f.
 A12: 1, 2, 3 CATEGORICAL
 s, g, p
 A13: continuous.
 A14: continuous.

5REMARKS:

 Missing Attribute Values:
 37 cases (5%) HAD one or more missing values. The missing
 values from particular attributes WERE:

 A1: 12
 A2: 12
 A4: 6
 A5: 6
 A6: 9
 A7: 9
 A14: 13

 THESE WERE REPLACED BY THE MODE OF THE ATTRIBUTE (CATEGORICAL)
 MEAN OF THE ATTRIBUTE (CONTINUOUS)

 There is no cost matrix.


_____________________________________________________________________________

Three remarks relating to the StatLog version:

 THE LABELS HAVE BEEN CHANGED FOR THE CONVENIENCE OF THE STATISTICAL
 ALGORITHMS. FOR EXAMPLE, ATTRIBUTE 4 ORIGINALLY HAD 3 LABELS p,g,gg
 AND THESE HAVE BEEN CHANGED TO LABELS 1,2,3.

1. Attributes 4 and 5 of the original WERE APPARENTLY IDENTICAL,
 so ATTRIBUTE 4 OF THE ORIGINAL WAS REMOVED
 (for the convenience of the statistical algorithms).

2. Where attributes were categorical, the categories were given numerical
 labels in the order of the relative risk of being class "+". Treat
 as categorical if thought desirable. All StatLog trials treated
 these variables as numerical.

3. A stepwise regression procedure strongly suggests that only
 attributes A5, A8, A9, A13 and A14 are relevant. Improved results
 are often obtained if only these five attributes are used.

CONTACTS
 statlogadm@ncc.up.pt
 bob@stams.strathclyde.ac.uk

 See README file for general information

================================================================================

0,1.
A1: 0,1.
A2: continuous.
A3: continuous.
A4: 1,2,3.
A5: 1, 2,3,4,5, 6,7,8,9,10,11,12,13,14.
A6: 1, 2,3, 4,5,6,7,8,9.
A7: continuous.
A8: 1, 0.
A9: 1, 0.
A10: continuous.
A11: 1, 0.
A12: 1, 2, 3.
A13: continuous.
A14: continuous.