[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bioconductor
Subject:    Re: [BioC] R:  R: R: R: how to find the VALIDATED pair (miRNA,
From:       "michael watson \(IAH-C\)" <michael.watson () bbsrc ! ac ! uk>
Date:       2009-06-29 7:47:53
Message-ID: 8975119BCD0AC5419D61A9CF1A923E9508B29250 () iahce2ksrv1 ! iah ! bbsrc ! ac ! uk
[Download RAW message or body]

Why do you need all the fields?
Don't you just need mir name (e.g. hsa-let-7d) and ensembl transcript id (e.g. \
ENST000000012345)?


-----Original Message-----
From: mauede@alice.it [mailto:mauede@alice.it]
Sent: Mon 29/06/2009 8:26 AM
To: Sean Davis
Cc: michael watson (IAH-C); Steve Lianoglou; bioconductor List
Subject: R: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, \
gene-3'UTR-sequence)  
Yes. I opened  and stared at file  \
http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl many times. 
I thought it would be possible to extract all the fields content in there through \
BioMart queries.  Basically, the match between the miRNAs from "mature.fa" and their \
respecive targeted genes from  \
http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl  has to be done scanning \
the two  files manually (basic R functions). Then some of the info extracted from  \
http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl can be used with BioMart \
quesries to get the 3'URT sequances. Did I get it right ?

I infer that not all the fields in file  \
http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl  can be extracted through \
BioMart queries (TRUE / FALSE) ?

Unluckily our group Biology professor, who could have helped with nomenclature and \
where to find what, is hospitalized in critical conditions  with a heart attack.
 
Thank you for your patience and understanding,
Maura


-----Messaggio originale-----
Da: Sean Davis [mailto:seandavi@gmail.com]
Inviato: lun 29/06/2009 4.58
A: mauede@alice.it
Cc: michael watson (IAH-C); Steve Lianoglou; bioconductor List
Oggetto: Re: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, \
gene-3'UTR-sequence)  
On Sun, Jun 28, 2009 at 10:26 PM, <mauede@alice.it> wrote:

> Since "mature.fa" and "maturestar.fa" contain the EXPERIMENTALLY
> VALIDATED miRNAs (is it TRUE ?) ,please,  assume I have read "mature.fa"
> into a list.
> I have to retain only the  miRNAs from humans. Therefore I havel erased all
> the list elements whose description does not start with "hsa". Am I mistaken
> ?
> 
That is correct, yes.

> 
> In our present emergency situation I have to prepare a text file containing
> blocks of data described in the following.
> Each block contains a human VALIDATED miRNA identifier and sequence
> (Example:  "hsa-miR-20a "    "UAAAGUGCUUAUAGUGCAGGUAG")
> followed by the  identifier and 3'UTR sequence of ALL genes that are
> targeted by such a miRNA.
> Here is what my output file should look like. I have no idea what to pick
> as target gene identifier. But I have to use the "hsa...." identifier for
> the human miRNAs.
> 
> VALIDATED miRNA[1] identifer    miRNA[1] sequence       #BLOCK_1  start
> target-gene[1,1]  3'UTR sequence
> target-gene[1,2]  3'UTR sequence
> ...............................................
> target-gene[1,n]  3'UTR sequence                                 #BLOCK_1
> end
> 
> VALIDATED miRNA[2] identifer    miRNA[2] sequence        #BLOCK_2  start
> target-gene[1,1]  3'UTR sequence
> target-gene[1,2]  3'UTR sequence
> ...............................................
> target-gene[1,m]  3'UTR sequence                                 #BLOCK_2
> end
> 
> .....................................................................
> .....................................................................
> 
> VALIDATED miRNA[k] identifer    miRNA[k] sequence        #BLOCK_k  start
> target-gene[k,1]  3'UTR sequence
> target-gene[k,2]  3'UTR sequence
> ...............................................
> target-gene[k,j]  3'UTR sequence                                  #BLOCK_k
> end
> 
> 
> I understand I can get the genes data and 3UTR sequences from Ensembl
> through BioMart.
> My problem is: given the VALIDATED miRNAs description from "mature.fa",
> for instance    "hsa-miR-20a MIMAT0000075 Homo sapiens miR-20a"
> which attributes shall I use to get the identifier and relative 3'UTR
> sequence of ALL the genes that are target for the above described miRNA ?
> 
Again, Maura, this has been answered several times now.

http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl


> Someone has already told me there is no BioMart attribute returning the
> identifier "hsa-miR-20a".
> I ask whether there exist a BioMart attribute returning  "MIMAT000007" or
> "miR-20a" ?
> 
> In short, I am looking for the attributes that allow me to relate the
> miRNAs data from "mature.fa" with the genes data from Ensembl.
> 
This information is in the .txt file download from the site above.


> 
> The reason why I mentioned the VALIDATED file from miRecords is because
> that Excel file seems to contain miRNA identifiers that correspond to
> the Ensembl data returned by the attribute "hgnc_symbol"... if I am not
> mistaken.
> 
> Sorry, I cannot answer your question "which attributes do you need ..."
> because I do not know which attributes allow me to match
> the miRNAs info from "mature.fa" with the genes info from Ensembl.
> I am proceeding by trial&error and bothering Biocoductor people !
> 
 -----Messaggio originale-----Da: michael watson (IAH-C) [
mailto:michael.watson@bbsrc.ac.uk <michael.watson@bbsrc.ac.uk>]

> Inviato: dom 28/06/2009 23.55
> A: mauede@alice.it; Steve Lianoglou
> Cc: Sean Davis; bioconductor List
> Oggetto: RE: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
> 
> Yes, but what are you trying to do?  Biomart has a very complex structure,
> I admit that; but why do you need/want all those attributes?  What are the
> attributes you need?
> 
> This works:
> 
> library(biomaRt)
> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
> getBM(attributes=c("go_molecular_function_description",
> "go_molecular_function_linkage_type",
> "ensembl_gene_id",
> "ensembl_transcript_id"),
> 
> filters='ensembl_transcript_id',value='ENST00000295228',mart=hmart)
> 
> It gets the GO molecular function data for ensembl human transcript
> ENST00000295228.  If that's what I want to do, then the code is right; if
> it's not, then the code is wrong.
> 
> How does the query you specify below relate to your question on microRNAs?
> 
> -----Original Message-----
> From: mauede@alice.it [mailto:mauede@alice.it <mauede@alice.it>]
> Sent: Sun 28/06/2009 6:29 PM
> To: michael watson (IAH-C); Steve Lianoglou
> Cc: Sean Davis; bioconductor List
> Subject: R: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
> 
> Sure. I have to do that. I am just struggling to get all the pieces
> together. To me most of those names have no meaning as I do not have any
> Biology background.
> Here in the following I am pasting s weird error ... maybe it is clear to
> you.
> I am proceeding with getting 10 consecutive attributes at a tiime to find
> the ones that I need, if any.
> So far I have successfully extracted the first 40 attributes from the
> listAttributes(mart) but now ...
> 
> > library(biomaRt)
> > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
> Checking attributes ... ok
> Checking filters ... ok
> > getBM(attributes=c("go_molecular_function_description",
> +                    "go_molecular_function_linkage_type",
> +                    "clone_based_ensembl_gene_name",
> +                    "clone_based_ensembl_transcript_name",
> +                    "clone_based_vega_gene_name",
> +                    "clone_based_vega_transcript_name",
> +                    "ccds",
> +                    "embl",
> +                    "entrezgene",
> +                    "ottt"),
> +
> filters='ensembl_transcript_id',value='ENST00000295228',mart=hmart)
> Error in getBM(attributes = c("go_molecular_function_description",
> "go_molecular_function_linkage_type",  :
> Query ERROR: caught BioMart::Exception::Usage: Too many attributes
> selected for External References
> 
> 
> 
> -----Messaggio originale-----
> Da: michael watson (IAH-C) \
> [mailto:michael.watson@bbsrc.ac.uk<michael.watson@bbsrc.ac.uk> ]
> Inviato: dom 28/06/2009 16.50
> A: mauede@alice.it; Steve Lianoglou
> Cc: Sean Davis; bioconductor List
> Oggetto: RE: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
> 
> Hi Maura
> 
> Well, you can get gene:target info from miRBase, read in using CORNA or
> just read.table.
> You can get miRNA sequences also from miRBase using readFASTA.
> You can get ensembl gene sequences using biomaRt.
> You can read in miRecords data using RODBC.
> 
> You can then link this all together using merge(), though I appreciate some
> work needs to be done on the list provided by readFASTA.
> 
> Other than actually doing the work for you, I'm not sure what else we can
> do.... :)
> 
> Mick
> 
> -----Original Message-----
> From: mauede@alice.it [mailto:mauede@alice.it <mauede@alice.it>]
> Sent: Sun 28/06/2009 3:35 PM
> To: michael watson (IAH-C); Steve Lianoglou
> Cc: Sean Davis; bioconductor List
> Subject: R: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
> 
> Thank you very much.
> I just realized the biomart server is up & running again.
> Now I have learnt that BioMart can extract a lot of data from Ensembl (from
> where I have been told to get the genes info)
> and can also download the validated miRNAs compressed files.
> 
> I stress the main problem I am experienciing, though, is still open.
> In fact I have to find a piece of data that allows me to relate all the
> gene info I can get from BioMart querying Ensembl
> to the downloaded miRNAs info. This is because the miRNA identifier is not
> available through BioMart .... I wish I were mistaken.
> 
> However, some other (unique ?) miRNA attribute, that is available through
> BioMart, is also present in the VALIDATED targets file that is downloadable
> in XLS format from miRecords. This piece of data would allow me to relate
> the gene 3UTS string to the targeting miRNA.
> The issue is that I do not know how often such miRecords file is updated,
> and the downloading  is to be performed outside R environment.
> Maybe R might handle the download automatically through the R "system"
> function and then the XLS file can be processed through R package
> "RExcelInstaller" ..... just a speculation ...
> 
> Regards,
> Maura
> 
> 
> -----Messaggio originale-----
> Da: michael watson (IAH-C) \
> [mailto:michael.watson@bbsrc.ac.uk<michael.watson@bbsrc.ac.uk> ]
> Inviato: dom 28/06/2009 10.15
> A: Steve Lianoglou
> Cc: mauede@alice.it; Sean Davis; bioconductor List
> Oggetto: RE: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
> 
> The power of Bioconductor :D
> 
> So, some code would look like this:
> 
> > mat <- gzcon(url("
> ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/mature.fa.gz"))
> > matfas <- readFASTA(mat, strip.descs=TRUE)
> > matstar <- gzcon(url("
> ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/maturestar.fa.gz"))
> > matstarfas <- readFASTA(matstar, strip.descs=TRUE)
> 
> 
> -----Original Message-----
> From: Steve Lianoglou \
> [mailto:mailinglist.honeypot@gmail.com<mailinglist.honeypot@gmail.com> ]
> Sent: Sun 28/06/2009 8:51 AM
> To: michael watson (IAH-C)
> Cc: mauede@alice.it; Sean Davis; bioconductor List
> Subject: Re: [BioC] R:  R: R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
> 
> > They'll be in fasta format, and whether or not Bioconductor can read
> > them in I have no idea - I use Bioperl for all my sequence handling.
> 
> 
> Yes, bioconductor can: the Biostrings package provides readFASTA and
> writeFASTA that handle this for you.
> 
> -steve
> 
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
> 
> Contact Info: http://cbio.mskcc.org/~lianos<http://cbio.mskcc.org/%7Elianos>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e
> tutti i telefonini TIM!
> Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer
> 
> 
> 
> 
> Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e
> tutti i telefonini TIM!
> Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e
> tutti i telefonini TIM!
> Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer
> 






Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e tutti i \
telefonini TIM! Vai su \
http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer

_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: \
http://news.gmane.org/gmane.science.biology.informatics.conductor


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic