 .names file created by George John, October 1994

1. TITLE:
 Letter Image Recognition Data

 The objective is to identify each of a large number of blackandwhite
 rectangular pixel displays as one of the 26 capital letters in the English
 alphabet. The character images were based on 20 different fonts and each
 letter within these 20 fonts was randomly distorted to produce a file of
 20,000 unique stimuli. Each stimulus was converted into 16 primitive
 numerical attributes (statistical moments and edge counts) which were then
 scaled to fit into a range of integer values from 0 through 15. We
 typically train on the first 16000 items and then use the resulting model
 to predict the letter category for the remaining 4000. See the article
 cited above for more details.

2.USE IN STATLOG
 2.1 Testing Mode
 Train and Test

 2.2 Special PreProcessing
 No

 2.3 Test Results
 Success Rate
 Algorithm Test
 
 Alloc80 93.600
 KNN 93.000
 LVQ 92.100
 QuaDisc 88.700
 Cn2 88.500
 BayTree 87.600
 NewId 87.200
 IndCart 87.000
 C4.5 86.800
 Dipol92 82.400
 Radial 76.700
 LogDisc 76.600
 Ac2 75.500
 Castle 75.500
 Kohonen 74.800
 Cal5 74.700
 Smart 70.500
 Discrim 69.800
 BackProp 67.30
 Bayes 47.100
 Itrule 40.600
 Default 4.000
 Cascade 0.0
 Cart 0.000

3. SOURCE Information and Paste Usage
 3.1 Source
  Creator: David J. Slate
  Odesta Corporation; 1890 Maple Ave; Suite 115; Evanston, IL 60201
  Donor: David J. Slate (dave@math.nwu.edu) (708) 4913867
  Date: January, 1991

 3.2 Past Usage:
  P. W. Frey and D. J. Slate (Machine Learning Vol 6 #2 March 91):
 "Letter Recognition Using Hollandstyle Adaptive Classifiers".

 The research for this article investigated the ability of several
 variations of Hollandstyle adaptive classifier systems to learn to
 correctly guess the letter categories associated with vectors of 16
 integer attributes extracted from raster scan images of the letters.
 The best accuracy obtained was a little over 80%. It would be
 interesting to see how well other methods do with the same data.


4. DATASET DESCRIPTION
 Number of Instances:
 20000
 Train 15000
 Test 5000

 Number of Attributes:
 16 (numeric features)

 NUMBER of CLASSES : 26
 capital letter (26 values from A to Z)

 Class Distribution:
 789 A 766 B 736 C 805 D 768 E 775 F 773 G
 734 H 755 I 747 J 739 K 761 L 792 M 783 N
 753 O 803 P 783 Q 758 R 748 S 796 T 813 U
 764 V 752 W 787 X 786 Y 734 Z

 Attribute Information:

 1. xbox horizontal position of box (integer)
 2. ybox vertical position of box (integer)
 3. width width of box (integer)
 4. high height of box (integer)
 5. onpix total # on pixels (integer)
 6. xbar mean x of on pixels in box (integer)
 7. ybar mean y of on pixels in box (integer)
 8. x2bar mean x variance (integer)
 9. y2bar mean y variance (integer)
 10. xybar mean x y correlation (integer)
 11. x2ybr mean of x * x * y (integer)
 12. xy2br mean of x * y * y (integer)
 13. xege mean edge count left to right (integer)
 14. xegvy correlation of xege with y (integer)
 15. yege mean edge count bottom to top (integer)
 16. yegvx correlation of yege with x (integer)

 Missing Attribute Values: None

CONTACTS
 statlogadm@ncc.up.pt
 bob@stams.strathclyde.ac.uk


================================================================================
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26.
xbox: continuous.
ybox: continuous.
width: continuous.
high : continuous.
onpix: continuous.
xbar: continuous.
ybar: continuous.
x2bar: continuous.
y2bar: continuous.
xybar: continuous.
x2ybr: continuous.
xy2br: continuous.
xege: continuous.
xegvy: continuous.
yege: continuous.
yegvx: continuous.