'Re: [BioC] HTqPCR problems'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bioconductor
Subject:    Re: [BioC] HTqPCR problems
From:       Simon Melov <smelov () buckinstitute ! org>
Date:       2012-06-29 15:58:23
Message-ID: 43857F42-62E5-4B37-B2EF-0C7250307EDE () buckinstitute ! org
[Download RAW message or body]

Hi Heidi,
I think I've identified the problem. Currently it appears as though HTqPCR reads the \
sample ID's and genes in from top to bottom of the CSV output from the Biomark. This \
is not the sample order we load in. As long as thats made clear in the vignette, it \
will prevent any confusion. We typically have a loading list, in which we associate \
samples with groups (numbered 1-48, or 1-96 for both formats). I was getting \
confusing results (laid out below) as I assumed HTqPCR associated sample IDs in the \
loading order, not the CSV format top to bottom. Am I correct here in how HTqPCR \
reads in the data from the CSV file?

thanks again,

best

Simon

On Jun 28, 2012, at 2:11 PM, Simon Melov wrote:

> Hi Heidi,
> getting there, hopefully if you can clarify the following issue, all will be well \
> and good. 
> After yesterdays correspondence, I'm now producing nice plots, when I check the \
> actual values being plotted, they dont match up to the sample ID's. In fact, if I \
> dont bother assigning groups, the sample ID's dont match to their respective gene \
> CT values. I'm worried there is some underlying problem with the data structure I'm \
> not understanding. 
> I understand the code, its just the samples dont match the reported gene values in \
> the csv file. 
> for example
> 
> > head(groupID)
> Sample Treatment
> 1    S28       SMY
> 2    L20       LFY
> 3    M26       MMY
> 4     L1       LFR
> 5    L30       LFY
> 6    K13       KMO
> 
> > plotCtOverview(raw6, genes=featureNames(raw6)[1], \
> > group=groupID$Treatment,legend=FALSE, col=1:length(unique(groupID$Treatment)))
> 
> produces a nice plot of a tubulin gene across the groups, as you suggested \
> yesterday . Yet if I look at the values, they dont match the CSV values for \
> specific genes/samples I used. If I turn off groups, and look at samples without \
> merging by group, I can see that the values dont match the appropriate gene being \
> displayed. My question is, where is the sample order being drawn from in the CSV \
> file? Is there a simple check I can use to see that what is being plotted, is what \
> I think is being plotted? The group ID sample-Treatment is correct, and all the \
> samples in the original CSV file are correct. 
> Is it possible that the package is assigning gene/sample ID in some other order \
> than that I've supplied? I just want to be sure that when HTqPCR pulls the sample \
> ID and maps it to the appropriate gene/Group, some transformation is not happening. \
>  Fluidigm suggests a particular order in loading samples and genes. These are \
> numbered 1-48 (sample), and 1-48 (gene) for a 48.48 plate (and the same for a 96.96 \
> plate). This is the order I supplied the sample IDs in the groupID file above. How \
> do you map the raw csv output to gene/sample id? 
> Is there a way of checking that the sample/gene/group ID is correct?
> 
> as always, thanks in advance for your help
> 
> best
> 
> s
> On Jun 27, 2012, at 3:27 PM, Heidi Dvinge wrote:
> 
> > > Hi Heidi,
> > > you are correct, yes 48.48.
> > > The example you provide below is exactly what I needed for clarification
> > > for groups. I was trying to reverse engineer what you had done with the
> > > original expression set package for microarrays, but from below, I can get
> > > this to work now.
> > > 
> > Glad it works. Hopefully by the next BioConductor release I'll remember to
> > clarify the plotCtOverview help file.
> > 
> > > Just to be clear, I have 5 48.48 plates. Should I normalize each
> > > individually prior to combining, or should I reformat to a 2304x1 each,
> > > combine, then normalize (not sure if you can do that or not)
> > > 
> > Hm, that's one of the questions I've also been asking myself, so I would
> > be curious to hear what your results from this are.
> > 
> > If you suspect that there are some major factors influencing the 5 plates
> > systematically, then normalising them in a 2304 x 5 object should
> > (hopefully) correct for that. For example, they may have been run on
> > different days, by different people, or perhaps there was a short power
> > cut during the processing of one of them. This might be visible if you
> > have for example a boxplot of Ct from all 48*5 samples, and you see blocks
> > of them shifted up or down.
> > 
> > Obviously, this doesn't take care of normalisation between your samples
> > within each plate though. If you suspect your samples to have some
> > systematic variation that you need to account for, then you can normalise
> > each plate individually (as a 48 x 48) object. Alternatively, you can try
> > to combine within- and between-sample normalisation by taking all 48 x 240
> > values at once.
> > 
> > In principle, you can split, reformat and the recombine the data in
> > however many ways you like. Personally, with any sort of data I prefer to
> > go with as little preprocessing as possible, since each additional step
> > can potentially introduce its own biases into the data. So unless there
> > are some obvious variation between your 5 plates, I'd probably stick with
> > just normalisation between the samples, e.. using a 48 x 240 object.
> > 
> > Of course, you may have different preferences, or find out that a
> > completely different approach is required for this particular data set.
> > 
> > \Heidi
> > 
> > > thanks again for your prompt responses!
> > > 
> > > best
> > > 
> > > s
> > > 
> > > On Jun 27, 2012, at 2:26 PM, Heidi Dvinge wrote:
> > > 
> > > > Hi Simon,
> > > > 
> > > > > Thanks for the help Heidi,
> > > > > but I'm still having troubles, your comments on the plotting helped me
> > > > > solve the outputs. But if I want to just display some groups (for
> > > > > example
> > > > > the LO group in the example below), how do I associate a group with
> > > > > multiple samples (ie biological reps)?
> > > > > 
> > > > > Currently I'm associating genes with samples  by reading in the file as
> > > > > below
> > > > > plate6=read.delim("plate6Sample.txt", header=FALSE)
> > > > > #this is a file to associate sample ID with the genes in the biomark
> > > > > data,
> > > > > as currently HTqPCR does not seem to associate the sample info in the
> > > > > Biomark output to the gene IDs
> > > > > 
> > > > Erm, no, it doesn't :-/
> > > > 
> > > > > samples=as.vector(t(plate6))
> > > > > raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48,
> > > > > n.data=48, samples=samples)
> > > > > #now I have samples and genes similar to your example in the guide, but
> > > > > I
> > > > > want to associate samples to groups now. In the guide, you have an
> > > > > example
> > > > > where you have entire files as distinct samples, but in our runs, we
> > > > > never
> > > > > have that situation. I have a file which associates samples to groups,
> > > > > which I read in...
> > > > > 
> > > > > groupID=read.csv("plate6key.csv")
> > > > > 
> > > > > but how do I associate the samples with their appropriate groups for
> > > > > biological replicates with any of the functions in HtQPCR?
> > > > 
> > > > I'm afraid I'm slightly confused here (sorry, long day). Exactly how is
> > > > your data formatted? I.e. are the columns either individual samples, or
> > > > from files containing multiple samples? So for example for a single
> > > > 48.48
> > > > arrays, is your qPCRset object 2304 x 1 or 48 x 48?
> > > > 
> > > > From your readCtData command I'm guessing you have 48 x 48, i.e. all 48
> > > > samples from your 1 array are in columns. In that case the 'groups'
> > > > parameter in plotCtOverview will need to be a vector of length 48,
> > > > indicating how you want the 48 columns in your qPCRset object to be
> > > > grouped together.
> > > > 
> > > > Below is an example of (admittedly ugly) plots. I don't know if that's
> > > > similar to your situation at all.
> > > > 
> > > > \Heidi
> > > > 
> > > > > # Reading in data
> > > > > exPath <- system.file("exData", package = "HTqPCR")
> > > > > raw1 <- readCtData(files = "BioMark_sample.csv", path = exPath, format
> > > > > =
> > > > "BioMark", n.features = 48, n.data = 48)
> > > > > # Check sample names
> > > > > head(sampleNames(raw1))
> > > > [1] "Sample1" "Sample2" "Sample3" "Sample4" "Sample5" "Sample6"
> > > > > # Associate samples with (randomly chosen) groups
> > > > > anno       <- data.frame(sampleID=sampleNames(raw1), Group=rep(c("A", "B",
> > > > "C", "D"), times=c(4,24,5,15)))
> > > > > head(anno)
> > > > sampleID Group
> > > > 1  Sample1     A
> > > > 2  Sample2     A
> > > > 3  Sample3     A
> > > > 4  Sample4     A
> > > > 5  Sample5     B
> > > > 6  Sample6     B
> > > > > # Plot the first gene - for each sample individually
> > > > > plotCtOverview(raw1, genes=featureNames(raw1)[1], legend=FALSE,
> > > > col=1:nrow(anno))
> > > > > # Plot the first gene - for each group
> > > > > plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group,
> > > > legend=FALSE, col=1:length(unique(anno$Group)))
> > > > > # Plot the first gene, using group "A" as a control
> > > > > plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group,
> > > > legend=FALSE, col=1:length(unique(anno$Group)), calibrator="A")
> > > > 
> > > > 
> > > > 
> > > > > You recommend below using a vector, but I dont see how that helps me
> > > > > associate the samples in the Expression set.
> > > > > 
> > > > > thanks again!
> > > > > 
> > > > > s
> > > > > 
> > > > > On Jun 26, 2012, at 12:48 PM, Heidi Dvinge wrote:
> > > > > 
> > > > > > > Hi,
> > > > > > > I'm having some troubles selectively sub-setting, and graphing up
> > > > > > > QPCR
> > > > > > > data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates).
> > > > > > > I'd
> > > > > > > like to be able to visualize specific genes, with specific groups we
> > > > > > > run
> > > > > > > routinely on our Biomark system. Typical runs are across multiple
> > > > > > > plates,
> > > > > > > and have multiple biological replicates, and usually 2 or more
> > > > > > > technical
> > > > > > > replicates (although we are moving away from technical reps, as the
> > > > > > > CVs
> > > > > > > are so tight).
> > > > > > > 
> > > > > > > Can anyone help with this? Heidi?
> > > > > > > 
> > > > > > > raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48,
> > > > > > > n.data=48, samples=samples)
> > > > > > > #Ive read the samples in from a separate file, as when you read it
> > > > > > > in,
> > > > > > > it
> > > > > > > doesnt take the sample names supplied in the biomark output#
> > > > > > > #Next, I want to plot genes of interest, with samples of interest,
> > > > > > > and
> > > > > > > I'm
> > > > > > > having trouble getting an appropriate output#
> > > > > > > 
> > > > > > > g=featureNames(raw6)[1:2]
> > > > > > > plotCtOverview(raw6, genes=g, groups=groupID$Treatment,
> > > > > > > col=rainbow(5))
> > > > > > > 
> > > > > > > #This plots 1 gene across all 48 samples#
> > > > > > > #but the legend doesnt behave, its placed on top of the plot, and I
> > > > > > > cant
> > > > > > > get it to display in a non-overlapping fashion#
> > > > > > > #I've tried all sorts of things in par, but nothing seems to shift
> > > > > > > the
> > > > > > > legend's position#
> > > > > > > 
> > > > > > As the old saying goes, whenever you want a job done well, you'll have
> > > > > > to
> > > > > > do it yourself ;). In this case, the easiest thing is probably to use
> > > > > > legend=FALSE in plotCtOverview, and then afterwards add it yourself in
> > > > > > the
> > > > > > desired location using legend(). That way, if you have a lot of
> > > > > > different
> > > > > > features or groups to display, you can also use the ncol parameter in
> > > > > > legend to make several columns within the legend, such as 3x4 instead
> > > > > > of
> > > > > > the default 12x1.
> > > > > > 
> > > > > > Alternatively, you can use either xlim or ylim in plotCtOverview to
> > > > > > add
> > > > > > some empty space on the side where there's then room for the legend.
> > > > > > 
> > > > > > > #I now want to plot a subset of the samples for specific genes#
> > > > > > > > LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment==
> > > > > > > > "LFY")
> > > > > > > > LOY
> > > > > > > Sample Treatment
> > > > > > > 2     L20       LFY
> > > > > > > 5     L30       LFY
> > > > > > > 7     L45        LO
> > > > > > > 20    L40        LO
> > > > > > > 27    L43        LO
> > > > > > > 33    L29       LFY
> > > > > > > 36    L38        LO
> > > > > > > 40    L39        LO
> > > > > > > 43    L23       LFY
> > > > > > > 
> > > > > > > 
> > > > > > > > plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5))
> > > > > > > Warning messages:
> > > > > > > 1: In split.default(t(x), sample.split) :
> > > > > > > data length is not a multiple of split variable
> > > > > > > 2: In qt(p, df, lower.tail, log.p) : NaNs produced
> > > > > > > > 
> > > > > > 
> > > > > > Does it make sense if you split by groups=LOY$Treatment? It looks like
> > > > > > the
> > > > > > object LOY itself is a data frame, rather than the expected vector.
> > > > > > 
> > > > > > Also, you may have to 'repeat' the col=rainbow() argument to fit your
> > > > > > number of features.
> > > > > > 
> > > > > > > 
> > > > > > > #it displays the two groups defined by treatment, but doesnt do so
> > > > > > > nicely,
> > > > > > > very skinny bars, and the legend doesnt reflect what its displaying#
> > > > > > > #again, I've tried monkeying around with par, but not sure what
> > > > > > > HTqPCR
> > > > > > > is
> > > > > > > calling to make the plots#
> > > > > > > 
> > > > > > If the bars are very skinny, it's probably because you're displaying a
> > > > > > lot
> > > > > > of features. Nothing much to do about that, except increasing the
> > > > > > width
> > > > > > or
> > > > > > your plot :(.
> > > > > > 
> > > > > > \Heidi
> > > > > > 
> > > > > > > please help!
> > > > > > > 
> > > > > > > thanks
> > > > > > > 
> > > > > > > Simon.
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > Bioconductor mailing list
> > > > > > > Bioconductor@r-project.org
> > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > > > > Search the archives:
> > > > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: \
> http://news.gmane.org/gmane.science.biology.informatics.conductor

_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: \
http://news.gmane.org/gmane.science.biology.informatics.conductor

[prev in list] [next in list] [prev in thread] [next in thread]