Decision Trees [unpruned] using weka

Pruning

http://gautam.lis.illinois.edu/monkmiddleware/public/analytics/decisiontree.html - also tells what kind of algorithms are used in J48 classifier to form decision trees.

http://en.wikipedia.org/wiki/Pruning_(decision_trees)


The problem of Over-fitting: [wiki]
  1. You've got the same or greater number of parameters than the number of observations.
  2. The model performs perfectly well on the training [cos it has learned it rather than learning a generalized concept] set but not so well on the test set.
  3. Pruning helps reduce the problem of over-fitting.
Cross-Validation: [Source]
10-fold: Divide the training set into 10 parts; use 9 for training and the one for testing. Repeat the process 10 times where the part used to test is never repeated. Average the 10 results! Weka then produces an evaluation result and an estimate of the error. The model is then run one last time on the entire dataset. 

How to apply your model [formed using the training set] on the test set: [Source]


No comments:

Post a Comment