[prev in list] [next in list] [prev in thread] [next in thread]
List: bioc-devel
Subject: Re: [Bioc-devel] Random access to sequences in fasta files
From: Thomas Dybdal Pedersen <thomasp85 () gmail ! com>
Date: 2015-01-29 16:15:07
Message-ID: 5603EC1D-91BD-4D55-BB01-594CCB828F57 () gmail ! com
[Download RAW message or body]
Thanks Martin
This was thought as a feauture request/discussion of biostrings, which is why I \
posted it here. Thought biostrings io capabilities was behind most other fasts \
readers on bioconductor...
/Thomas
> Den 29/01/2015 kl. 15.45 skrev Martin Morgan <mtmorgan@fredhutch.org>:
>
> > On 01/29/2015 06:41 AM, Thomas Lin Pedersen wrote:
> > Hi
> >
> > I'm querying on whether there are any plans on supporting random access reading \
> > of fasta files in the sense that it is possible to upfront specify the indexes of \
> > sequences that should be read in.
> > I'm working on a package for comparative microbial genomics and it would be a \
> > huge speed improvement if it was possible to quickly read in 1000's of sequences \
> > distributed on as many files. Currently the proper, vectorised approach requires \
> > all files to be read in at once and then subsetted, but this can result in \
> > XStringSet's in the Gb range, just to access some sequences. The slow, un-R way \
> > would be to loop through each file (or each sequence using skip and nrec to only \
> > read in relevant sequences). I'm preferentially looking for an interface like:
> > readXStringSet(files, rec)
> >
> > Where rec is either a vector that would index into the XStringSet as if \
> > everything from files had been read in, or a list with the same length as files, \
> > containing the indexes of interest for each file.
>
> Hi Thomas -- this should really be posted to support.bioconductor.org, but see \
> Rsamtools::FaFile and rtracklayer::TwoBitFile access through getSeq.
> Martin
>
> > with best wishes
> >
> > Thomas
> > _______________________________________________
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic