--===============2141349438== Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On Sat, 11 Dec 2010 14:57:31 -0800, MaryLee wrote: > Hi, > > I want to learn the relationship between the number of folds in k- fold > cross-validation and the accuracy of the decision tree algorithm. I know > exactly what cross-validation does. it divides the data set and use one > of the partition for test and the rest for training the algorithm. the > problem is I have two data sets and when I increase the folds the > accuracy increases on first data set while decreases on another data > set. when I continue Increasing the number of folds the accuracy still > increase on the first data set while still decreases on the second data > sets. The first data set is normal -- it's what you hope will happen. With more data, the learning algorithm is learning more *useful* information about the data set. To deploy this alogrithm in the real world, you should try to have a large enough data set that the change in accuracy isn't very dramatic. The second data set is an example of overfitting. With more data, the learning algorithm is learning more information about the training set, but that information is *useless* because it doesn't reflect what's going on in the testing set. You need to rethink what you're doing with the second data set, either by picking a different learning algorithm that's more resistant to overfitting, by taking a logical look at what features you're using to learn and determining which ones might be superfluous (and causing the overfitting), or by applying automatic feature selection. -- Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory. Department of Computer Science. Illinois Institute of Technology. http://www.iit.edu/~kbloom1/ --===============2141349438== Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline _______________________________________________ Wekalist mailing list Send posts to: Wekalist@list.scms.waikato.ac.nz List info and subscription status: https://list.scms.waikato.ac.nz/mailman/= listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.= html --===============2141349438==--