[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bioc-devel
Subject:    Re: [Bioc-devel] VRanges with multiple samples
From:       Robert Castelo <robert.castelo () upf ! edu>
Date:       2015-01-29 17:36:16
Message-ID: 54CA6F90.9060508 () upf ! edu
[Download RAW message or body]

hi Michael, thanks for sharing your opinion, comments below,

On 01/28/2015 06:22 PM, Michael Lawrence wrote:
[...]

> Is your concern here scalability, ease of use, or what? If scalability,
> we should probably start thinking about a more efficient representation
> for repeated vectors, kind of like Rle, except for rep(,each=FALSE). It
> would just %% the index. I think this would be generally useful and so
> may be of more value than a more complex VRanges. After all, it is the
> (totally justifiable) complexity of VCF that motivated VRanges in the
> first place.

i'm concerned about the scalability with multisample VCFs when adding 
annotations. What you propose about using Rle-like vectors to store 
identical values from different samples together sounds good to me and 
I'm also in favor of keeping data structures as simple as possible. 
Maybe for the time being I'll try to use 'VRanges' just as they are now 
and I'll try to explore how bad it gets when scaling in samples and 
annotations to justify doing something about it along the lines you suggest.

[...]

> I am not sure if coercion via as() would make sense here, since there is
> no obvious reason why the split would be by sample. Why not just use
> split(vr, sampleNames(vr))? That should work already.

i see your point in that the splitting a VRanges could be motivated by 
something else than sample and as you suggest 'split()' does the work 
very fast. actually invoking to the VRangesList constructor i get what i 
was looking for:

do.call("VRangesList", split(vr, sampleNames(vr)))
VRangesList of length 3
names(3): sample1 sample2 sample3


although i realize now that the rle-like strategy you propose then would 
not be usable when splitting by sample.

cheers,

robert.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic