[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wekalist
Subject:    Re: [Wekalist] ArrayIndexOutOfBoundsException in instanceclassification in java using weka
From:       Mark Hall <mhall () waikato ! ac ! nz>
Date:       2018-12-04 20:12:18
Message-ID: 46B8D4E3-2C18-46FE-B78C-743D844693FE () waikato ! ac ! nz
[Download RAW message or body]

The problem is the two separate applications of StringToWordVector on different \
datasets. As Peter said, this produces two different dictionaries which, in turn, \
leads to different features in the vectorized form - i.e. the training and test data \
are not compatible in this case.

The solution is to only apply StringToWordVector once (on the training data) to learn \
a single dictionary. This dictionary then needs to be applied to the test data to \
produce vectors that are consistent with those generated on the training data. As \
Eibe said, the way to accomplish this is to configure a FilteredClassifier in your \
code with J48 as the base classifier and StringToWordVector as the filter. Train this \
classifier on your raw training data (i.e. the data that contains the String \
attribute(s)) and then apply it to your raw test data for prediction.

Cheers,
Mark.

On 4/12/18, 5:22 PM, "gevindu" <wekalist-bounces@list.waikato.ac.nz on behalf of \
gevindumallikarachchi2015msc@gmail.com> wrote:

    Still I'm not quite sure where am I doing wrong. These are the steps I have
    followed when I am creating the model as follows. 
    
    1) Load the .arff file in to weka explore.
    2) Apply StringToWordVector 
    3) Set the class attribute 
    4) select the classification algorithm.
    5) Generate and save the model (this time I select FilteredClassifier). 
    
    I did following changes in my java code.
    
    I applied StringToWordVector to my test set.
    
    Finally, I print my Instances object and I noticed only class attribute is
    there and 0 is set as my string to be predicted (image is attached of the
    output). 
    <http://weka.8497.n7.nabble.com/file/t6230/test.jpg> 
    
    Now I'm not getting the  ArrayIndexOutOfBoundsException; instead test set is
    always classified to the first class type.
    
    Please find my new model and the java code. 
    
    FFF.model <http://weka.8497.n7.nabble.com/file/t6230/FFF.model>   
    NewJava.java <http://weka.8497.n7.nabble.com/file/t6230/NewJava.java>  
    
    Eibe Frank-2 wrote
    > Your code looks fine. The model is the problem. It is a simple J48 model,
    > but it needs to be a FilteredClassifier model that has StringToWordVector
    > as the filter and J48 as the base classifier.
    > 
    > I think the mistake you made is that you processed the training data from
    > your ARFF file in the Preprocess panel to make it suitable for training
    > using J48. What you need to do is leave the data as it is in the
    > Preprocess panel before proceeding to training and instead incorporate all
    > filtering steps into the FilteredClassifier. If there are multiple
    > filtering steps, you can use MultiFilter inside FilteredClassifier, or you
    > can nest several FilteredClassifier objects inside one another.
    > 
    > Cheers,
    > Eibe
    > 
    > From: gevindu
    > Sent: Tuesday, 4 December 2018 7:10 AM
    > To: 
    
    > wekalist@.ac
    
    > Subject: Re: [Wekalist] ArrayIndexOutOfBoundsException in
    > instanceclassification in java using weka
    > 
    > Please find the attached correct java file 
    > TestPred.java
    > &lt;http://weka.8497.n7.nabble.com/file/t6230/TestPred.java&gt;  
    > 
    > 
    > 
    > --
    > Sent from: http://weka.8497.n7.nabble.com/
    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: 
    
    > Wekalist@.ac
    
    > To subscribe, unsubscribe, etc., visit
    > https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette:
    > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
    > 
    > 
    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: 
    
    > Wekalist@.ac
    
    > To subscribe, unsubscribe, etc., visit
    > https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette:
    > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
    
    
    Eibe Frank-2 wrote
    > Your code looks fine. The model is the problem. It is a simple J48 model,
    > but it needs to be a FilteredClassifier model that has StringToWordVector
    > as the filter and J48 as the base classifier.
    > 
    > I think the mistake you made is that you processed the training data from
    > your ARFF file in the Preprocess panel to make it suitable for training
    > using J48. What you need to do is leave the data as it is in the
    > Preprocess panel before proceeding to training and instead incorporate all
    > filtering steps into the FilteredClassifier. If there are multiple
    > filtering steps, you can use MultiFilter inside FilteredClassifier, or you
    > can nest several FilteredClassifier objects inside one another.
    > 
    > Cheers,
    > Eibe
    > 
    > From: gevindu
    > Sent: Tuesday, 4 December 2018 7:10 AM
    > To: 
    
    > wekalist@.ac
    
    > Subject: Re: [Wekalist] ArrayIndexOutOfBoundsException in
    > instanceclassification in java using weka
    > 
    > Please find the attached correct java file 
    > TestPred.java
    > &lt;http://weka.8497.n7.nabble.com/file/t6230/TestPred.java&gt;  
    > 
    > 
    > 
    > --
    > Sent from: http://weka.8497.n7.nabble.com/
    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: 
    
    > Wekalist@.ac
    
    > To subscribe, unsubscribe, etc., visit
    > https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette:
    > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
    > 
    > 
    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: 
    
    > Wekalist@.ac
    
    > To subscribe, unsubscribe, etc., visit
    > https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette:
    > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
    
    
    Eibe Frank-2 wrote
    > Your code looks fine. The model is the problem. It is a simple J48 model,
    > but it needs to be a FilteredClassifier model that has StringToWordVector
    > as the filter and J48 as the base classifier.
    > 
    > I think the mistake you made is that you processed the training data from
    > your ARFF file in the Preprocess panel to make it suitable for training
    > using J48. What you need to do is leave the data as it is in the
    > Preprocess panel before proceeding to training and instead incorporate all
    > filtering steps into the FilteredClassifier. If there are multiple
    > filtering steps, you can use MultiFilter inside FilteredClassifier, or you
    > can nest several FilteredClassifier objects inside one another.
    > 
    > Cheers,
    > Eibe
    > 
    > From: gevindu
    > Sent: Tuesday, 4 December 2018 7:10 AM
    > To: 
    
    > wekalist@.ac
    
    > Subject: Re: [Wekalist] ArrayIndexOutOfBoundsException in
    > instanceclassification in java using weka
    > 
    > Please find the attached correct java file 
    > TestPred.java
    > &lt;http://weka.8497.n7.nabble.com/file/t6230/TestPred.java&gt;  
    > 
    > 
    > 
    > --
    > Sent from: http://weka.8497.n7.nabble.com/
    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: 
    
    > Wekalist@.ac
    
    > To subscribe, unsubscribe, etc., visit
    > https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette:
    > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
    > 
    > 
    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: 
    
    > Wekalist@.ac
    
    > To subscribe, unsubscribe, etc., visit
    > https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette:
    > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
    
    
    
    
    
    --
    Sent from: http://weka.8497.n7.nabble.com/
    _______________________________________________
    Wekalist mailing list
    Send posts to: Wekalist@list.waikato.ac.nz
    To subscribe, unsubscribe, etc., visit \
https://list.waikato.ac.nz/mailman/listinfo/wekalist  List etiquette: \
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html  


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
To subscribe, unsubscribe, etc., visit \
https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: \
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic