'Re: [BioC] DESeq analysis'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bioconductor
Subject:    Re: [BioC] DESeq analysis
From:       Wolfgang Huber <whuber () embl ! de>
Date:       2012-06-27 17:36:13
Message-ID: 4FEB448D.1020105 () embl ! de
[Download RAW message or body]

Dear Narges

thank you for the feedback. Your second question is easy: use the idiom
     res1 <- subset(res, padj<0.1)
instead, this will avoid the creation of rows full of NA whenever 
res$padj is NA. Alternatively
     res[order(res$padj)[1:n], ]
with 'n' your favourite lucky number might be useful. Have a look at the 
R-intro manual for more on subsetting of arrays and dataframes in R.

Your first question: can you show us the data for the genes where you 
know that they are differentially expressed? Perhaps then it might 
become more apparent why DESeq / nbinomtest did not agree. Also, what 
does the dispersion plot for cds look like? (This is the plot produced 
by plotDispEsts in the vignette).

	Best wishes
	Wolfgang

narges [guest] scripsit 06/26/2012 06:17 PM:
> 
> Hi all
> 
> I am doing some RNA seq analysis with DESeq. I have applied the nbinomTest to my \
> dataset which I know have many differentially expressed genes but the first problem \
> is that the result values for "padj"column is almost NA and sometimes 1. and when I \
> want to have a splice from my fata frame the result is not meaningful for me. 
> -- output of sessionInfo():
> 
> res <- nbinomTest(cds, "Male", "Female")
> 
> > head(res)
> id   baseMean baseMeanA  baseMeanB foldChange log2FoldChange
> 1 ENSG00000000003  0.1130534  0.000000  0.2261067        Inf            Inf
> 2 ENSG00000000005  0.0000000  0.000000  0.0000000        NaN            NaN
> 3 ENSG00000000419 14.3767155 17.162610 11.5908205  0.6753530     -0.5662863
> 4 ENSG00000000457 17.0174761 15.342800 18.6921526  1.2183013      0.2848710
> 5 ENSG00000000460  3.9414822  2.855099  5.0278659  1.7610131      0.8164056
> 6 ENSG00000000938 16.0894945 18.350117 13.8288718  0.7536122     -0.4081058
> pval padj
> 1 0.9959638    1
> 2        NA   NA
> 3 0.3208560    1
> 4 0.5942512    1
> 5 0.4840607    1
> 6 0.5409953    1
> 
> 
> > res1 <- res[res$padj<0.1,]
> > head(res1)
> id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
> NA   <NA>       NA        NA        NA         NA             NA   NA   NA
> NA.1 <NA>       NA        NA        NA         NA             NA   NA   NA
> NA.2 <NA>       NA        NA        NA         NA             NA   NA   NA
> NA.3 <NA>       NA        NA        NA         NA             NA   NA   NA
> NA.4 <NA>       NA        NA        NA         NA             NA   NA   NA
> NA.5 <NA>       NA        NA        NA         NA             NA   NA   NA
> 
> my first question is that why although I know there are some differentially \
> expressed genes in the my data, all the padj values are NA or 1 and the second \
> question is this "NA.1" , "NA.2", ..... which are emerged as the first column of \
> object "res1"instead of name of genes 
> Thank you so much
> Regards
> 
> --
> Sent via the guest posting facility at bioconductor.org.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: \
> http://news.gmane.org/gmane.science.biology.informatics.conductor 

-- 
Best wishes
	Wolfgang

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber

_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: \
http://news.gmane.org/gmane.science.biology.informatics.conductor

[prev in list] [next in list] [prev in thread] [next in thread]