'Re: [Wekalist] public void buildClassifier(Instances inst): training'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wekalist
Subject:    Re: [Wekalist] public void buildClassifier(Instances inst): training
From:       Peter Reutemann <fracpete () waikato ! ac ! nz>
Date:       2006-06-30 1:26:01
Message-ID: 44A47DA9.4050308 () waikato ! ac ! nz
[Download RAW message or body]

> I would like to mention the function public void 
> buildClassifier(Instances inst)
>  From what I understand, to build a new classifier in WEKA, we first 
> implement this function to INITIALIZE and TRAIN the classifier. 

Yep.

> I am not 
> sure about TRAINING the classifier. At this point (inside the 
> buildClassifier function, we are not yet given a test instance. 

Correct, the classifier will never, ever see a test-instance with a set 
class.

> The test 
> intsance will be given in distributionForInstance(Instance) or 
> classifyInstance(Instance)). My worry is that I dont think that we can 
> TRAIN the classifier without using any test vector.

The evaluation of the classifier happens outside the classifier.

> 1. Am I correct that I should have the training phase in the 
> buildClassifier function? Or this phase is taken place during the 
> distributionForInstance(Instance), .i.e using the test instance to 
> compare with other existing training instances to see which training 
> case is closest to the test, and then make prediction for it.

The distributionForInstance method only returns the distribution of the 
classes that it determines from the model built with the buildClassifier 
method. It is not allowed to use the class labels in this method (if 
there are any) to update ones internal model - that would be cheating...

> 2. So basically, it seems that we have 2 separate data: training dataset 
> (which is firstly loaded) and test dataset (actually a part of training 
> dataset which is set aside by a 10-fold holdouts, or a real separate 
> test dataset provided by user ). So if I select "use training data" 
> option in the Classifier GUI, then what should be the test instances for 
> my distributionForInstance(Instance)?

Forget about train and test sets in your classifier! You only need to 
think about building the classifier with the training data and returning 
a distribution for an _unknown_ instance via distributionForInstance.

HTH

Cheers, Peter
-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/     +64 (7) 838-4466 Ext. 5174

_______________________________________________
Wekalist mailing list
Wekalist@list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
[prev in list] [next in list] [prev in thread] [next in thread]