Abstract

Inductive machine learning algorithms are knowledge-based learning algorithms which take training instances as input and produce knowledge as output. One popular induction algorithm is Quinlan's ID3 [1986]. This algorithm produces knowledge in the form of a decision tree. Each path in the tree can be interpreted as a rule with the leaves representing rule conclusions. Selected attributes which describe the training instances form the interior nodes of the tree.

The ID3 algorithm is extremely sensitive to noisy training data. In an effort to reduce the effects of noise on tree construction, Quinlan used the X2 test to identify noisy attribute values and exclude them at certain points in tree construction. This approach has proven to be effective in some cases and not effective in others.

This paper examines ID3 trees produced from noisy training data. To determine the effects of the X2 test in various situations, several test domains were used. Various levels of noise were injected into each training set and the corresponding trees were evaluated. It was observed that the effectiveness of the X2 test on noisy data is related to both the type of matching criteria used at leaf nodes and the size of the training set

Department(s)

Computer Science

Report Number

CSC-92-19

Document Type

Technical Report

Document Version

Final Version

File Type

text

Language(s)

English

Rights

© 1992 University of Missouri--Rolla, All rights reserved.

Publication Date

Fall 1992

Share

 
COinS