[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-devel
Subject:    Re: [Rd] should `data` respect default.stringsAsFactors()?
From:       peter dalgaard <pdalgd () gmail ! com>
Date:       2016-02-19 15:23:19
Message-ID: AAA51D5B-A133-4A7D-A1BC-B7615B45B8CC () gmail ! com
[Download RAW message or body]


On 19 Feb 2016, at 16:02 , Cook, Malcolm <MEC@stowers.org> wrote:

> Hi,
> 
> > Aha... Hadn't noticed that stringsAsFactors only works via as.is in read.table.
> > 
> > Yes, the doc should probably be fixed. The code probably not 
> 
> Agreed.  
> 
> Is someone on-list authorized and willing to make the documentation change?  I \
> suppose I could learn what it takes to be a "player", but for such a trivial fix, \
> it probably is overkill.  Dissenting opinions?

I have fixed it for r-devel.

-pd

> 
> > -- packages
> > loading different data sets depending on user options is an even worse idea
> > than havíng the option in the first place... (I don't mean having the \
> > possibility, I mean the default.stringsAsFactor thing).
> > 
> > In general, read.table() gets many things wrong
> 
> I agree with you that "read.table() gets many things wrong" and I too have my \
> favorite workarounds - but that was not my concern.  My concern is that data() does \
> not work as documented. 
> ~Malcolm
> 
> > , if you don't set switches
> > and/or postprocess. E.g., even when you do intend to read factors, the
> > alphabetical level order is often not desired. My favourite workaround for
> > data() is to drop a corresponding foo.R file in the ./data directory. This will \
> > be run in preference to loading foo.txt (or foo.csv, etc) and can contain, like,
> > 
> > dd <- read.table(foo.txt,.....)
> > dd$cook <- factor(dd$cook, levels=c("rare","medium","well-done"))
> > 
> > etc.
> > 
> > -pd
> > 
> > 
> > 
> > > On 19 Feb 2016, at 01:39 , Joshua Ulrich <josh.m.ulrich@gmail.com> wrote:
> > > 
> > > On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC@stowers.org>
> > wrote:
> > > > Hi Peter,
> > > > 
> > > > Sorry if I was not clear.  Perhaps an example will make my point:
> > > > 
> > > > > data(iris)
> > > > > class(iris$Species)
> > > > [1] "factor"
> > > > > write.table(iris,'data/myiris.tab')
> > > > > data(myiris)
> > > > > class(myiris$Species)
> > > > [1] "factor"
> > > > > rm(myiris)
> > > > > options(stringsAsFactors = FALSE)
> > > > > data(myiris)
> > > > > class(myiris$Species)
> > > > [1] "factor"
> > > > > myiris<-read.table("data/myiris.tab",header=TRUE)
> > > > > class(myiris$Species)
> > > > [1] "character"
> > > > 
> > > > I am surprised to find that in the above
> > > > setting the global option stringsAsFactors = FALSE does NOT effect
> > how Species is being read in by the `data` function
> > > > whereas
> > > > setting the global option stringsAsFactors = FALSE DOES effect how
> > Species is being read in by read.table
> > > > 
> > > > especially since data is documented as calling read.table.
> > > > 
> > > To be explicit, it's documented as calling read.table(..., header =
> > > TRUE) in this case, but it actually calls read.table(..., header =
> > > TRUE, as.is = FALSE), which results in class(myiris$Species) of
> > > "factor".
> > > 
> > > R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE)
> > > R> class(myiris$Species)
> > > [1] "factor"
> > > 
> > > So it seems like adding as.is = FALSE to the call in the documentation
> > > would clear this up.
> > > 
> > > > In my opinion, one or the other should change (the behavior of data, or the
> > documentation).
> > > > 
> > > > <bleep> <bleep>,
> > > > 
> > > > ~ Malcolm
> > > > 
> > > > 
> > > > > -----Original Message-----
> > > > > From: peter dalgaard [mailto:pdalgd@gmail.com]
> > > > > Sent: Thursday, February 18, 2016 3:32 PM
> > > > > To: Cook, Malcolm <MEC@stowers.org>
> > > > > Cc: r-devel@stat.math.ethz.ch
> > > > > Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
> > > > > 
> > > > > What the <bleep> are you on about? data() does many things, only some
> > of
> > > > > which call read.table() et al., and the ones that do have no special
> > treatment
> > > > > of stringsAsFactors.
> > > > > 
> > > > > -pd
> > > > > 
> > > > > > On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC@stowers.org> wrote:
> > > > > > 
> > > > > > Hiya,
> > > > > > 
> > > > > > Probably been debated elsewhere....
> > > > > > 
> > > > > > I note that R's `data` function does not respect default.stringsAsFactors
> > > > > > 
> > > > > > By my lights, it should, especially as it is documented to call \
> > > > > > read.table,
> > > > > which DOES respect.
> > > > > > 
> > > > > > Oh, but:  http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-
> > > > > tp921891p921893.html
> > > > > > 
> > > > > > Compelling.  I have to agree.
> > > > > > 
> > > > > > So, I change my mind.
> > > > > > 
> > > > > > By my lights, `data` should then be documented to NOT respect
> > > > > default.stringsAsFactors.
> > > > > > 
> > > > > > Else?
> > > > > > 
> > > > > > ~Malcolm Cook
> > > > > > 
> > > > > > ______________________________________________
> > > > > > R-devel@r-project.org mailing list
> > > > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > > > 
> > > > > --
> > > > > Peter Dalgaard, Professor,
> > > > > Center for Statistics, Copenhagen Business School
> > > > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > > > > Phone: (+45)38153501
> > > > > Office: A 4.23
> > > > > Email: pd.mes@cbs.dk  Priv: PDalgd@gmail.com
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > ______________________________________________
> > > > R-devel@r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > 
> > > 
> > > 
> > > --
> > > Joshua Ulrich  |  about.me/joshuaulrich
> > > FOSS Trading  |  www.fosstrading.com
> > > R/Finance 2016 | www.rinfinance.com
> > 
> > --
> > Peter Dalgaard, Professor,
> > Center for Statistics, Copenhagen Business School
> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Office: A 4.23
> > Email: pd.mes@cbs.dk  Priv: PDalgd@gmail.com
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes@cbs.dk  Priv: PDalgd@gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic