'Re: [R-sig-ME] [FORGED] Re: Using variance components of lmer for ICC'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-sig-mixed-models
Subject:    Re: [R-sig-ME]  [FORGED] Re: Using variance components of lmer for ICC
From:       Pierre de Villemereuil <pierre.de.villemereuil () mailoo ! org>
Date:       2018-06-18 7:45:57
Message-ID: 2780159.yGQmEfl06x () ev8sa6
[Download RAW message or body]

Hi,

> When I refer to residuals, as I understand from lmer modelling, it is the level 1 \
> variance. Back to my design, I have many subjects, two fixed raters, sessions \
> nested within a subject, and trials nested within sessions. Hence, level 1 variance \
> would typically reflect inter-trial variance.

Thing is: there is nothing like this for GLMMs. The lowest level of "residual \
variance" is basically the distribution variance.

You could think there would be such a thing with a threshold model, but it turns out \
that the total variance of the liability scale is non identifiable, so nothing there \
either.

> If I were to use lmer modelling, I do not have to specify the level 1 random \
> effects (i.e. ~1| SUBJ:SESSION:TRIAL), as it is in the residuals. 

Indeed.

> However, using mcmcglmm, there appears to be an even lower level of random effects \
> which is fixed (as you mentioned). So my question is, should I specify also the \
> level 1 random effects?

No, you shouldn't. There is already the "residual" variance in MCMCglmm (the R part \
of the variance) which should account for that, but that you need to fix to 1 \
(because of the non identifiability issue I'm mentioning above). If you had such kind \
of random effects, it means you re-model such a non identifiable variance.

> Sample formula using your suggestion of probit (I just like to know if the priors \
> and model are appropriate): mcglmm_mod = MCMCglmm(vas ~ RATER, 
> random =  ~ SUBJ +
> SUBJ:SESSION +
> SUBJ:SESSION:TRIAL,
> data = as.data.frame (df %>% filter (SIDE == "R")),
> family = "ordinal",
> prior=list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=0),
> G2=list(V=1, nu=0),
> G3=list(V=1, nu=0))))
> Many thanks again for your (and everyone's) help thus far.

As I'm saying above, I suggest you remove the SUBJ:SESSION:TRIAL effect, if it \
corresponds, as you state to a "level-1 variance". 

Another thing is that I don't think it is wise to use nu = 0 for such a model prior. \
I'd recommend using the parameter expansion of MCMCglmm's prior and especially a \
chi-square like prior that has been shown to perform quite well for such models: G1 = \
list(V = 1, nu = 1000, alpha.mu = 0, alpha.V = 1)

And a last thing is that the "threshold" family is supposed to have better mixing and \
be faster than the "ordinal" family according to Jarrod Hadfield. You might want to \
give it a try (and beware that the + 1 should be removed from the ICC denominator \
when using "threshold", as per my previous email).

Best,
Pierre.

> 
> Kind regards,
> Bernard
> -----Original Message-----
> From: pierre.de.villemereuil@mailoo.org <pierre.de.villemereuil@mailoo.org> 
> Sent: Saturday, June 16, 2018 7:28 PM
> To: Bernard Liew <B.Liew@bham.ac.uk>
> Cc: r-sig-mixed-models@r-project.org
> Subject: Re: [R-sig-ME] [FORGED] Re: Using variance components of lmer for ICC \
> computation in reliability study 
> Dear Bernard,
> 
> > Thanks Pierre for the suggestion to use MCMCglmm. Very useful.
> 
> Glad this was helpful.
> 
> > 1) Can I ask why when taking the ratio of the variance components, the \
> > denominator is ICC = Vcomp / (sum(variance components of the model) + 1)? Why is \
> > the one added?
> 
> You can find an explanation for this in the Supplementary File on our article on \
> quantitative genetics interpretation of GLMMs: \
> http://www.genetics.org/content/204/3/1281.supplemental 
> It is fairly technical. Trying to put it simply: there is an equivalence between a \
> binomial/probit model and the threshold model. This is because a probit link is \
> actually the CDF of a normal distribution (same goes for a logit and a logistic \
> distribution): using a probit link and a binomial is equivalent to using a \
> threshold model after adding a random noise normally distributed (this is Fig. S1 \
> in our Suppl. File above).  
> And the variance of this random noise is thus the variance of a standard normal \
> distribution which is 1 (or the variance of a standard logistic distribution, which \
> is (pi^2)/3, for a logit link). 
> Computing the ICC on the liability scale "as if" a threshold model was used thus \
> only requires to add that extra-variance on the denominator. 
> > 2) I have tried mixed ordinal modelling using "ordinal" or MCMCglmm, 
> > and noticed the variance of the residuals of the model is not 
> > produced. I have also read that the residual is assumed to be as you 
> > mentioned (pi^2)/3. Is there a reason (maybe a not so technical one?)
> 
> It all depends what you call "residual". In MCMCglmm, there is for example a \
> "residual" variance called "units" which value should be fixed to e.g. 1 when \
> running families such as ordinal. This variance is required to be added in the \
> denominator of the ICC. If you have only a random effect, you would have: Vrand / \
> (Vrand + Vunits + 1) 
> Note that there is also the "threshold" family in MCMCglmm, which is special in the \
> sense that (to say it quickly) the "units" variance is the variance of the probit \
> link and should be fixed to 1. In that case, you would have: Vrand / (Vrand + \
> Vunits) 
> Sometimes this "extra-variance" (1 or (pi^2)/3) is indeed called "residual \
> variance". This is because, as per my explanation above, probit and logit links can \
> be seen as "adding a random noise then taking a threshold", this random noise is \
> sometimes seen as a "residual error" in the model. 
> I hope this will clarify things for you. I'm afraid it's difficult to explain \
> clearly the whys without going too much into the technicality of the models. 
> Cheers,
> Pierre.
> 
> > 
> > Kind regards,
> > Bernard
> > 
> > -----Original Message-----
> > From: pierre.de.villemereuil@mailoo.org 
> > <pierre.de.villemereuil@mailoo.org>
> > Sent: Friday, June 15, 2018 4:05 PM
> > To: r-sig-mixed-models@r-project.org
> > Cc: Ben Bolker <bbolker@gmail.com>; Doran, Harold <HDoran@air.org>; 
> > Bernard Liew <B.Liew@bham.ac.uk>
> > Subject: Re: [R-sig-ME] [FORGED] Re: Using variance components of lmer 
> > for ICC computation in reliability study
> > 
> > Hi,
> > 
> > > However, I'm not sure how one would go about computing an ICC from 
> > > ordinal data
> > 
> > I've never used the package "ordinal", but if it's anything like the "ordinal" \
> > family of MCMCglmm, then computing an ICC on the liability scale would be fairly \
> > easy, as one would just proceed as always and simply add the so-called "link \
> > variance" corresponding to the chosen link function (1 for probit, (pi^2)/3 for \
> > logit). E.g. for a given variance component Vcomp and a probit link: ICC = Vcomp \
> > / (sum(variance components of the model) + 1) 
> > However, computing an ICC on the data scale would be much more difficult as it is \
> > actually multivariate... 
> > I think in the case when such scores were used, having the estimate on the \
> > liability scale make sense though, as the binning is more due to our inability of \
> > finely measuring this scale rather than an actual property of the system. 
> > Cheers,
> > Pierre.
> > 
> > Le vendredi 15 juin 2018, 03:27:54 CEST Ben Bolker a ĂŠcrit :
> > > More generally, the best way to fit this kind of model is to use an
> > > *ordinal* model, which assumes the responses are in increasing 
> > > sequence but does not assume the distance between levels (e.g. 1 vs 
> > > 2,
> > > 2 vs 3 ...) is uniform.  However, I'm not sure how one would go 
> > > about computing an ICC from ordinal data ... (the 'ordinal' package 
> > > is the place to look for the model-fitting procedures). Googling it 
> > > finds some stuff, but it seems that it doesn't necessarily apply to 
> > > complex designs ...
> > > 
> > > https://stats.stackexchange.com/questions/3539/inter-rater-reliabili
> > > ty
> > > -for-ordinal-or-interval-data
> > > https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3402032/
> > > 
> > > 
> > > On Thu, Jun 14, 2018 at 6:58 PM, Doran, Harold <HDoran@air.org> wrote:
> > > > That's a helpful clarification, Rolf. However, with gaussian 
> > > > normal errors in the linear model, we can't *really* assume they 
> > > > would asymptote at 1 or 10. My suspicion is that these are 
> > > > likert-style ordered counts of some form, although the OP should 
> > > > clarify. In which case, the 1 or 10 are limits with censoring, as 
> > > > true values for some measured trait could exist outside those 
> > > > boundaries (and I suspect the model is forming predicted values outside of 1 \
> > > > or 10). 
> > > > 
> > > > 
> > > > On 6/14/18, 6:33 PM, "Rolf Turner" <r.turner@auckland.ac.nz> wrote:
> > > > 
> > > > > 
> > > > > On 15/06/18 05:35, Doran, Harold wrote:
> > > > > 
> > > > > > Well no, you šre specification is not right because your variable 
> > > > > > is not continuous as you note. Continuous means it is a real 
> > > > > > number between -Inf/Inf and you have boundaries between 1 and 10.
> > > > > > So, you should not be using a linear model assuming the outcome is \
> > > > > > continuous.
> > > > > 
> > > > > I think that the foregoing is a bit misleading.  For a variable to 
> > > > > be continuous it is not necessary for it to have a range from 
> > > > > -infinity to infinity.
> > > > > 
> > > > > The OP says that dv  "is a continuous variable (scale 1-10)".  It 
> > > > > is not clear to me what this means.  The "obvious"/usual meaning 
> > > > > or interpretation would be that dv can take (only) the (positive
> > > > > integer) values 1, 2, ..., 10.  If this is so, then a continuous 
> > > > > model is not appropriate.  (It should be noted however that people 
> > > > > in the social sciences do this sort of thing --- i.e. treat 
> > > > > discrete variables as continuous --- all the time.)
> > > > > 
> > > > > It is *possible* that dv can take values in the real interval 
> > > > > [1,10], in which case it *is* continuous, and a "continuous model"
> > > > > is indeed appropriate.
> > > > > 
> > > > > The OP should clarify what the situation actually is.
> > > > > 
> > > > > cheers,
> > > > > 
> > > > > Rolf Turner
> > > > > 
> > > > > --
> > > > > Technical Editor ANZJS
> > > > > Department of Statistics
> > > > > University of Auckland
> > > > > Phone: +64-9-373-7599 ext. 88276
> > > > > 
> > > > > > On 6/14/18, 11:16 AM, "Bernard Liew" <B.Liew@bham.ac.uk> wrote:
> > > > > > 
> > > > > > > Dear Community,
> > > > > > > 
> > > > > > > 
> > > > > > > I am doing a reliability study, using the methods of 
> > > > > > > https://www.ncbi.nlm.nih.gov/pubmed/28505546. I have a question 
> > > > > > > on the lmer formulation and the use of the variance components.
> > > > > > > 
> > > > > > > Background: I have 20 subjects, 2 fixed raters, 2 testing 
> > > > > > > sessions, and
> > > > > > > 10 trials per sessions. my dependent variable is a continuous 
> > > > > > > variable (scale 1-10). Sessions are nested within each 
> > > > > > > subject-assessor combination. I desire a ICC (3) formulation of 
> > > > > > > inter-rater and inter-session reliability from the variance components.
> > > > > > > 
> > > > > > > My lmer model is:
> > > > > > > 
> > > > > > > lmer (dv ~ rater + (1|subj) + (1|subj:session), data = df)
> > > > > > > 
> > > > > > > Question:
> > > > > > > 
> > > > > > > 1.  is the model formulation right? and is my interpretation 
> > > > > > > of the  variance components for ICC below right?
> > > > > > > 2.  inter-rater ICC = var (subj) / (var(subj) + var 
> > > > > > > (residual)) # I  read that the variation of raters will be lumped with \
> > > > > > > the residual 3.  inter-session ICC =( var (subj) + var (residual)) /( \
> > > > > > > var (subj) +  var (subj:session) + var (residual))  some simulated
> > > > > > > data:
> > > > > > > df = expand.grid(subj = c(1:20), rater = c(1:2), session = 
> > > > > > > c(1:2), trial  = c(1:10))  df$vas = rnorm (nrow (df_sim), mean = 
> > > > > > > 3, sd = 1.5)
> > > > > > > 
> > > > > > > I appreciate the kind response.
> > > > > 
> > > > > 
> > > > 
> > > > _______________________________________________
> > > > R-sig-mixed-models@r-project.org mailing list 
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > > 
> > > _______________________________________________
> > > R-sig-mixed-models@r-project.org mailing list 
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > 
> > 
> > 
> 
> 
> 
> 

_______________________________________________
R-sig-mixed-models@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

[prev in list] [next in list] [prev in thread] [next in thread]