[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wekalist
Subject:    [Wekalist] Re: the number of folds
From:       Ken Bloom <kbloom () gmail ! com>
Date:       2010-12-12 3:38:39
Message-ID: pan.2010.12.12.03.38.39 () gmail ! com
[Download RAW message or body]

On Sat, 11 Dec 2010 14:57:31 -0800, MaryLee wrote:

> Hi,
> 
> I want to learn the relationship between the number of folds in k- fold
> cross-validation and the accuracy of the decision tree algorithm. I know
> exactly what cross-validation does. it divides the data set and use one
> of the partition for test and the rest for training the algorithm. the
> problem is I have two data sets and when I increase the folds the
> accuracy increases on first data set while decreases on another data
> set. when I continue Increasing the number of folds the accuracy still
> increase on the first data set while  still decreases on the second data
> sets.

The first data set is normal -- it's what you hope will happen. With more 
data, the learning algorithm is learning more *useful* information about 
the data set. To deploy this alogrithm in the real world, you should try 
to have a large enough data set that the change in accuracy isn't very 
dramatic.

The second data set is an example of overfitting. With more data, the 
learning algorithm is learning more information about the training set, 
but that information is *useless* because it doesn't reflect what's going 
on in the testing set. You need to rethink what you're doing with the 
second data set, either by picking a different learning algorithm that's 
more resistant to overfitting, by taking a logical look at what features 
you're using to learn and determining which ones might be superfluous 
(and causing the overfitting), or by applying automatic feature selection.



-- 
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/




_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/=
listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.=
html


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic