'Re: [Wekalist] A beginner looking for help about tenfold cross-validation'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wekalist
Subject:    Re: [Wekalist] A beginner looking for help about tenfold cross-validation
From:       zhang lei <377320 () gmail ! com>
Date:       2013-12-13 3:03:25
Message-ID: 11FF2E3B-842E-4AFF-BD7E-FBC602285B2F () gmail ! com
[Download RAW message or body]

Thanks Eibe ^_^

On 2013-12-11, at 上午6:15, Eibe Frank <eibe@waikato.ac.nz> wrote:

> No, that way you would get an optimistically biased performance estimate. It's \
> important that the parameters are chosen based on the training data only. 
> Let's say you have split your data into ten folds T1, T2, ..., T10.
> 
> In the first split of a ten-fold cross-validation, T1 would be used as the test set \
> and the union of T2, ..., and T10, let's call it T2-10, would be used as the \
> training set. 
> Parameter selection *must* be treated as part of the learning process, so you can \
> only use the training data in T2-10 to select parameter values. One way to do this \
> is to split T2-10 into a new training set and a validation set. Then, you could \
> build models with different parameters on this new training set and evaluate them \
> on the validation set. Once you have found the best parameter, you can rebuild the \
> model from the full training set T2-10 with that parameter setting. Another, more \
> reliable, way of choosing parameters is internal cross-validation, where you \
> perform internal cross-validation runs on T2-10 to identify the best parameter (in \
> WEKA, you can do this using CVParameterSelection or GridSearch). The test set T1 is \
> only used once the final model, including parameter settings, has been established. \
>  You are right, you may end up with different parameter values being chosen for the \
> ten different splits of a ten-fold cross-validation. Parameter selection is part of \
> the learning process and the outcome of the learning process will normally be \
> different for each of the ten splits (e.g. you'd normally get slightly different \
> decision trees if you used decision tree learning in a cross-validation). 
> Cheers,
> Eibe
> 
> One way to do this is to split the
> 
> On 10/12/2013 23:45, zhang lei wrote:
> > 
> > thanks a lot, Sam.
> > 
> > So as you said, my question is:  the algorithm which 10-fold
> > cross-validation is valid for trained from the whole data set?  which
> > means, use the whole data set as training data to get an algorithm with
> > certain parameters, than split the whole data set into 10 folds,
> > use 10-fold cross-validation to validate that algorithm?
> > 
> > 
> > Thanks!
> > -Lei
> > 
> > 
> > On 2013-12-10, at 上午3:39, Sam Raker <sam.raker@gmail.com
> > <mailto:sam.raker@gmail.com>> wrote:
> > 
> > > From my understanding, 10-fold cross-validation isn't about
> > > parameters, it's about how the data gets split up. Imagine the data
> > > being divided into 10 parts: [1,2,3,4,5,6,7,8,9,10]
> > > 10-fold cross-validation starts with taking part 1 as the testing
> > > data, and parts 2-10 as the training data. The algorithm is trained
> > > and tested, and the results are stored. Then, part 2 is treated as the
> > > testing data, and parts 1, 3, 4, 5, 6, 7, 8, 9, and 10 are treated as
> > > training data. Then testing on part 3, training on parts 1, 2, 4,
> > > 5...10, and so on. The results are then averaged out (I think?). The
> > > idea is that one or more "folds" of your data could be unusual in some
> > > way, and so if you tested just on that part, the results wouldn't be
> > > valid for the whole data set.
> > > 
> > > 
> > > Hope this helps,
> > > -Sam
> > > 
> > > 
> > > On Mon, Dec 9, 2013 at 6:28 AM, zhang lei <377320@gmail.com
> > > <mailto:377320@gmail.com>> wrote:
> > > 
> > > Dear all,
> > > 
> > > 
> > > Pls allow me to introduce myself first. I'm a beginner who is
> > > working on the book named "DATA MINING, Practical Machine Learning
> > > Tools and Techniques".
> > > 
> > > I'm getting confused by tenfold cross-validation.  If I divide the
> > > whole data into training data and test data ( let's forget about
> > > validation data for a minute to simplify the question ) ,  each
> > > fold refers to a different set of training data and most likely
> > > come up with the same algorithm but different parameters.
> > > Therefore, the result of tenfold cross-validation error rate
> > > refers the same algorithm with which parameters?
> > > 
> > > For my poor English, I address the above question in another way
> > > below:
> > > 1st fold:               algorithm A with param a;
> > > 2nd fold:       algorithm A with param b;
> > > 3rd fold:               algorithm A with param c;
> > > ......
> > > 10th fold:      algorithm A with param j.
> > > conclusion: the tenfold cross-validation error rate is
> > > about algorithm A with param ???
> > > 
> > > 
> > > Looking forward to hearing from any of you. Thanks for your time!
> > > 
> > > 
> > > Yours sincerely
> > > 
> > > Lei ZHANG
> > > 
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: Wekalist@list.waikato.ac.nz
> > > <mailto:Wekalist@list.waikato.ac.nz>
> > > List info and subscription status:
> > > http://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette:
> > > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > 
> > > 
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: Wekalist@list.waikato.ac.nz
> > > <mailto:Wekalist@list.waikato.ac.nz>
> > > List info and subscription status:
> > > http://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette:
> > > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > 
> > 
> > 
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: Wekalist@list.waikato.ac.nz
> > List info and subscription status: \
> > http://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: \
> > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html 
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist@list.waikato.ac.nz
> List info and subscription status: \
> http://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: \
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: \
http://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: \
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

[prev in list] [next in list] [prev in thread] [next in thread]