'Re: MULTILPLE COMPARISON OF LSMEANS IN GENMOD PROCEDURE?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: MULTILPLE COMPARISON OF LSMEANS IN GENMOD PROCEDURE?
From:       Dale McLerran <stringplayer_2 () YAHOO ! COM>
Date:       2002-12-30 18:54:24
[Download RAW message or body]

Gregor,

Well, I take some issue with the statement that the residuals
in a general linear model should be normal.  For inference to
be correct in small samples, we must assume that the residuals
are normally distributed.  However, let's consider a situation
which could be described as a two-sample t-test.  Suppose that
you have 500 observations on the response Y at each of  two
values of some predictor (N1=500, N2=500, N=1000).  Now, in
all likelihood, the sampling distribution of the mean at each
value of the predictor will for all practical purposes be
normal regardless of the distribution of Y at each value of
the predictor.  I would have no problem with someone performing
a two-sample t-test in such a situation.  Now, the two-sample
t-test is just a specific form of general linear model, so one
could also fit the equivalent regression model without any
concern.  It is when you have small samples that concern about
the exact distributional properties of the residuals is
important.  However, when you have small samples, you may not
be able to state the distributional properties of the residuals.
It is something of a Catch-22.  You need to employ the correct
distributional assumptions when working with small samples, but
you cannot determine the distributional assumptions when the
sample size is small.  If you have large sample where you can
determine the distribution of the residuals, you don't need to
be terribly concerned about employing the proper distributional
assumptions.

That said, let me try to address where I think you might be
coming from.  Note that you don't give really adequate background
about your problem for a solid response.  But, I would observe
that if you fit a GLM with gamma distribution and an identity
link, then the assumption must be that the sum of the expected
value plus residual is distributed gamma.  That would mean that
the residuals are distributed gamma with a location shift.  Now
for some combinations of shape and scale parameters, the gamma
distribution may appear to be approximately normally distributed.
So, it is possible to fit a GLM with identity link assuming a
gamma distribution and find the residuals are approximately
normally distributed.

However, if this is the situation in which you find yourself,
then I would prefer to assume that the response is normally
distributed.  Now, finding that the residuals are approximately
normally distributed does not imply anything about the marginal
distribution of the response.  The marginal distribution could
appear to be gamma, while the residuals are distributed normally.
The residuals, which are the basis for inference, represent the
conditional distribution of the response.  Given the covariates
X1, X2, ..., Xk, the deviation from the response from its
expected value is normally distributed.  If the covariates take
on some long-tailed distribution, then you could find that the
conditional distribution of the response is normal, but the
marginal distribution of the response appears to be gamma.  In
this situation, one would state the distribution of the
response to be normal with mean X*beta and variance sigma-squared.

But the situation which you describe sounds as though you have
fit a generalized linear model assuming gamma distribution
employing an identity link function.  Now you are asking if the
residuals need to be normally distributed in order for bhat/se
to be distributed as t with specified df.  At least, that is
my interpretation of your question.  I would be quite hesitant
to place myself in this situation in the first place.  Obviously,
if you believe that the proper distribution is gamma, then the
residuals could not be normally distributed.  Why would you not
employ the canonical link?  I suppose that interpretation of
the parameters may be easier with identity link.  If that is
the reason for employing the identity link, then I would suggest
using bootstrap methods to test model parameters as well as
for testing linear combinations of model parameters.

HTH,

Dale

BTW, I have posted this back to SAS-L.  For a while, I and some
fellow statisticians who participate on SAS-L have been quite
busy with other work.  Some questions which might have evoked
some response 6 or 9 months ago have been left unanswered.
Sometimes workloads vary, and reposting your questions to the
list might generate more response.  I must be frank, though,
and suggest that a little more background about the problem
would facilitate discussion/resolution.  Why are you fitting
a GLM with assumption of gamma distribution?  Why have you
chosen the identity link?  What do the residuals look like -
long-tailed left, long-tailed right, or symmetric and
approximately normal?  How many observations do you have?
What is the question which you are trying to address?  Actually,
this last question should be your starting point.  You have this
specific problem.  In trying to address this problem, data were
collected which exhibit these certain properties.  So you fit
this particular model.  But, fitting this model leads to the
following questions: _______?  An adequate response to this
question can only be given if all of this background information
is provided.


--- Gorjanc Gregor <Gregor.Gorjanc@bfro.uni-lj.si> wrote:
> Hello Dale!
>
> Thank you for answering my question (bellow). I have posted also some
> other questions about generalized linear models some time ago. Well
> one
> of those is still hot and quite essential to my work. I've asked a
> lot
> of people, searched the literature but I still don't have the answer.
> I
> would be very glad if you could help me. The problem is:
>
> It is clear to me that residuals in general linear models should be
> normal.
> What about residuals from generalized linear model with assumed gamma
> distribution and identity link?
>
> Thank you in advance for your answer and happy New Year, Gregor.


=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@fhcrc.org
Ph:  (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic