'Re: [BioC] Re: KNN, SVM, and randomForest - How to predict testing'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bioconductor
Subject:    Re: [BioC] Re: KNN, SVM, and randomForest - How to predict testing
From:       Kasper Daniel Hansen <k.hansen () biostat ! ku ! dk>
Date:       2004-07-28 18:37:36
Message-ID: wqrfz7cvuz3.fsf () biostat ! ku ! dk
[Download RAW message or body]

Adaikalavan Ramasamy <ramasamy@cancer.org.uk> writes:

> I do not know much about exprSet (please correct me if I am wrong) but I
> think and treat exprSet as matrix. Indeed in my previous message, I was
> writing in the context of matrix.
> 
> data(affybatch.example)
> a <- rma(affybatch.example)
> m <- exprs(a)
> 
> Then I work with 'm' which may or may not be what you want. 
> 
> If you want to force a matrix to exprSet, the examples in
> help("exprSet") might be helpful.

an exprSet is a matrix of expression values coupled with a dataframe
of covariates. If you (original poster) look at the aforementioned
article, you will se that they use the original exprset (lets call it
Edata) in the following way:
  Xdata <- t(exprs(Edata))
  Ydata <- pData(Edata)["y-values"]
So you do not really need the exprset object, as it is only used to
get the matrix of expression values and the dataframe of classes. Now,
given that you have a fit (which you have constructed using a train
data set with known classes), you predict the classes in something
like
  predict(fit, newdata=Xdata.test)

I suggest looking at the code and try to separate the different
components.

/Kasper

> Regards, Adai.
> 
> 
> On Wed, 2004-07-28 at 14:09, Liu, Xin wrote:
> > Thanks Tom, Sean, Xavier for the reply, and especially Adai!
> > However I still have a problem. To put the microarray data into these supervised \
> > clustering, the expreSet need to be built. To build expreSet, you need to give \
> > the class of every sample. So when I predict samples with unknown classes, how to \
> > put them into the expreSet? Thank you! 
> > Xin
> > 
> > 
> > 
> > -----Original Message-----
> > From: Adaikalavan Ramasamy [mailto:ramasamy@cancer.org.uk]
> > Sent: 28 July 2004 13:00
> > To: Liu, Xin
> > Cc: Tom R. Fahland; BioConductor mailing list
> > Subject: Re: [BioC] KNN, SVM, and randomForest - How to predict
> > testwithout known categories
> > 
> > 
> > If algorithm 1 predicts "Yes", "Yes", "No", "No" for 4 samples and
> > algorithm 2 predicts "Yes", "No", "Yes", "No", how do you know which one
> > is the better algorithm ? So you use tests set with known classes to do
> > this. You can do this by breaking your learning set (samples with know
> > classes) into training and test set. Look up "cross validation".
> > 
> > Some example of built in cross validation
> > * knn.cv() is a leave one out cross-validation of knn()
> > * svm() in library(e1071) has an argument named 'cross' for cross
> > validation
> > In practice, I prefer to write my own wrapper for cross-validation to
> > ensure that sampling method is the same across all algorithms.
> > 
> > Once you have determined the best algorithm and features, you then use
> > predict() to predict samples with unknown classes.
> > 
> > Regards, Adai.
> > 
> > 
> > 
> > On Wed, 2004-07-28 at 09:18, Liu, Xin wrote:
> > > In R, before using KNN, SVM, and randomForest, a expreSet is needed to build, \
> > > which require the train WITH known catagories and the test WITH known \
> > > catagories. However, by definition, in supervised learning you always train \
> > > (with known catagories), then predict the test WITHOUT known catagories. I \
> > > wonder how to implement this. Thank you! 
> > > Xin
> > > 
> > > 
> > > 
> > > 
> > > 
> > > -----Original Message-----
> > > From: Tom R. Fahland [mailto:tfahland@genomatica.com]
> > > Sent: 27 July 2004 18:48
> > > To: Liu, Xin; bioconductor@stat.math.ethz.ch
> > > Subject: RE: [BioC] KNN, SVM,and randomForest - How to predict samples
> > > without category 
> > > 
> > > 
> > > By definition, in supervised learning you always train (with known
> > > catagories), then run your unbiased data through for prediction. Both CV
> > > and train/test partitions are good for choosing parameters and
> > > optimizing the algorithms. I have just completed a study predicting dose
> > > expsoure with good reasults using different algorithms. 
> > > Tom
> > > 
> > > -----Original Message-----
> > > From: bioconductor-bounces@stat.math.ethz.ch
> > > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Liu, Xin
> > > Sent: Tuesday, July 27, 2004 07:39
> > > To: bioconductor@stat.math.ethz.ch
> > > Subject: [BioC] KNN, SVM,and randomForest - How to predict samples
> > > without category 
> > > 
> > > 
> > > Dear all,
> > > 
> > > Supervised clusterings (KNN, SVM, and randomForest) use test sample set
> > > and train sample set to do prediction. To create the expreSet, the
> > > category is needed for each sample. However sometimes we need to predict
> > > sample without its category. Anybody has some clue to do this? Thank you
> > > very much!
> > > 
> > > Best regards,
> > > Xin LIU
> > > 
> > > 
> > > 
> > > This e-mail is from ArraGen Ltd\ \ The e-mail and any files\...{{dropped}}
> > > 
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor@stat.math.ethz.ch
> > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> > > 
> > 
> > 
> > 
> > 
> > 
> > This e-mail is from ArraGen Ltd
> > 
> > The e-mail and any files transmitted with it are confidential and privileged and \
> > intended solely for the use of the individual or entity to whom they are \
> > addressed.  
> > Any unauthorised direct or indirect dissemination, distribution or copying of \
> > this message and any attachments is strictly prohibited.  
> > If you have received the e-mail in error please notify helpdesk@arragen.com or \
> > telephone +44 28 38 363841 and delete the e-mail from your system. 
> > E-mail and other communications sent to this company may be reviewed or read by \
> > persons other than the intended recipient. 
> > Viruses : although we have taken steps to ensure that this e-mail and any \
> > attachments are free from any virus, you should, in keeping with good practice, \
> > ensure that they are actually virus free. 
> > ArraGen Ltd. Registration Number NI 43067
> > Registered Address :  Almac House, 20 Seagoe Industrial Estate, Craigavon, BT63 \
> > 5QD 
> > 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> 

-- 
Kasper Daniel Hansen, Research Assistant
Department of Biostatistics, University of Copenhagen

[prev in list] [next in list] [prev in thread] [next in thread]