[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-devel
Subject:    Re: [Rd] model.frame(), model.matrix(), and derived predictor variables
From:       Gabriel Becker <gmbecker () ucdavis ! edu>
Date:       2013-08-29 21:51:54
Message-ID: CADwqtCML-rzoRyP=V3OXqeWz2zWz061MUF5pH8wp0JiWUNdw+w () mail ! gmail ! com
[Download RAW message or body]

On Thu, Aug 29, 2013 at 6:21 AM, Ben Bolker <bbolker@gmail.com> wrote:

> On 13-08-28 05:43 PM, Gabriel Becker wrote:
> > Ben,
> >
> > It works for me ...
> >> x = rpois(100, 5) + 1
> >> y = rnorm(100, x)
> >> d = data.frame(x,y)
> >> m <- lm(y~log(x),d)
> >> update(m,data=model.frame(m))
> >
> > Call:
> > lm(formula = y ~ log(x), data = model.frame(m))
> >
> > Coefficients:
> > (Intercept)       log(x)
> >      -4.010        5.817
> >
> >
>
>     That's because x and y are still lying around in your global
> environment.  If you rm(x); rm(y) then it won't work any more.  And it
> wouldn't have worked if you had constructed your model frame directly as
>
>  d = data.frame(x=rpois(100,5)+1)
>  d = transform(d,y=rnorm(100,x))
>
> Ah, of course. that was silly of me.


> >
> > You can also re-fit using the model.matrix directly. In your example,
> > the model frame can be passed directly to lm.fit /lm.wfit.
>
>     Yes, if I want to refit the same model.  But if I want to do
> something else with the model (e.g. try fitting vs. x instead of log(x),
> or some other function of x) then it doesn't work.
>


This seems like a bug in model.frame then, as the documentation says

 "‘model.frame’ (a generic function) and its methods return a ‘data.frame’
with the variables needed to use ‘formula’ and any ‘...’ arguments."

Which pretty clearly seems to not be the case. There could be an argument
that x *should* not be required to use the formula, if it the fitting
functions could tell that log(x) is a variable in the data.frame and know
to use that then the retransformation of the variable would in some sense
wasted computations, but that is not the case in lm (or glm), as you note.

~G


>
>   cheers
>     Ben
> >
> >
> > ~G
> >
> >> sessionInfo()
> > R version 3.0.1 (2013-05-16)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> >  [7] LC_PAPER=C                 LC_NAME=C
> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] tools_3.0.1
> >
> >
> >
> >
> > On Sat, Aug 24, 2013 at 7:40 PM, Ben Bolker <bbolker@gmail.com
> > <mailto:bbolker@gmail.com>> wrote:
> >
> >
> >       Bump: just trying one more time to see if anyone had thoughts on
> this
> >     (so far it's just <crickets> ...)
> >
> >
> >     -------- Original Message --------
> >     Subject: model.frame(), model.matrix(), and derived predictor
> variables
> >     Date: Sat, 17 Aug 2013 12:19:58 -0400
> >     From: Ben Bolker <bbolker@gmail.com <mailto:bbolker@gmail.com>>
> >     To: R-devel@stat.math.ethz.ch <mailto:R-devel@stat.math.ethz.ch>
> >     <R-devel@stat.math.ethz.ch <mailto:R-devel@stat.math.ethz.ch>>
> >
> >
> >       Dear r-developers:
> >
> >       I am struggling with some fundamental aspects of model.frame().
> >
> >       Conceptually, I think of a flow from data -> model.frame() ->
> >     model.matrix; the data contain _input variables_, while model.matrix
> >     contains _predictor variables_: data have been transformed, splines
> and
> >     polynomials have been expanded into their corresponding
> >     multi-dimensional bases, and factors have been expanded into
> appropriate
> >     sets of dummy variables depending on their contrasts.
> >       I originally thought of model.frame() as containing input
> variables as
> >     well (but with only the variables needed by the model, and with cases
> >     containing NAs handled according to the relevant na.action setting),
> but
> >     that's not quite true.  While factors are retained as-is, splines and
> >     polynomials and parameter transformations are evaluated. For example
> >
> >     d <- data.frame(x=1:10,y=1:10)
> >     model.frame(y~log(x),d)
> >
> >     produces a model frame with columns 'y', 'log(x)' (not 'y', 'x').
> >
> >     This makes it hard (impossible?) to use the model frame to
> re-evaluate
> >     the existing formula in a model, e.g.
> >
> >     m <- lm(y~log(x),d)
> >     update(m,data=model.frame(m))
> >     ## Error in eval(expr, envir, enclos) : object 'x' not found
> >
> >     It seems to me that this is a reasonable thing to want to do
> >     (i.e. use the model frame as a stored copy of the data that
> >      can be used for additional model operations); otherwise, I
> >     either need to carry along an additional copy of the data in
> >     a slot, or hope that the model is still living in an environment
> >     where it can find a copy of the original data.
> >
> >     Does anyone have any insights into the original design choices,
> >     or suggestions about how they have handled this within their own
> >     code? Do you just add an additional data slot to the model?  I've
> >     considered trying to write some kind of 'augmented' model frame, that
> >     would contain the equivalent of
> >     that appeared in the formula but not in the model frame ...].
> >     setdiff(all.vars(formula),model.frame(m)) [i.e.  all input variables
> >     that appeared in the formula but not in the model frame ...].
> >
> >       thanks
> >        Ben Bolker
> >
> >     ______________________________________________
> >     R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
> >     https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
> >
> > --
> > Gabriel Becker
> > Graduate Student
> > Statistics Department
> > University of California, Davis
>
>


-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

	[[alternative HTML version deleted]]



______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic