[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-devel
Subject:    [Rd] locales and readLines
From:       Martin Morgan <mtmorgan () fhcrc ! org>
Date:       2007-08-31 16:30:43
Message-ID: 6phfy1zll8s.fsf () gopher4 ! fhcrc ! org
[Download RAW message or body]

R-developers,

I'm looking for some 'best practices', or perhaps an upstream solution
(I have a deja vu about this, so sorry if it's already been asked).
Problems occur when a file is encoded as latin1, but the user has a
UTF-8 locale (or I guess more generally when the input locale does not
match R's).  Here are two examples from the Bioconductor help list:

https://stat.ethz.ch/pipermail/bioconductor/2007-August/018947.html

(the relevant command is library(GEOquery); gse <- getGEO('GSE94'))

https://stat.ethz.ch/pipermail/bioconductor/2007-July/018204.html

I think solutions are:

* Specify the encoding in readLines.

* Convert the input using iconv.

* Tell the user to set their locale to match the input file (!)

Unfortunately, these (1 & 2, anyway) place extra burden on the package
author, to become educated about locales, the encoding conventions of
the files they read, and to know how R deals with encodings.

Are there other / better solutions? Any chance for some (additional)
'smarts' when reading files?

Martin
-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic