rm(list=ls())
library(lme4)
library(effects)
library(ggplot2)
library(MASS)
compact = function(x,digits=2){return(format(round(x,digits),nsmall=digits))}
ts=25
Let’s consider this simple illustrative case with N = 16 divided into
two conditions, with each subject providing k = 10 observed responses,
all different. With a mixed-model we could model responses like this:
fit = lmer(score ~ Condition + (1|id), data = df)
.
If we should report a standardized effect size (i.e., SMD, Cohen’s d), how do we compute it?
If we consider all observed responses (small circles, which include both within- and between-subject variance) we get: Cohen’s d = 0.61
However, if we consider only the (model-estimated, unobserved) true subject scores (diamonds, which include only between-subject variance) we get: Cohen’s d = 1.80
How do we estimate the second Cohen’s d using ONLY the model parameters in the summary? Let’s see the summary:
summary(fit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ Condition + (1 | id)
## Data: df
##
## REML criterion at convergence: 592.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.15519 -0.65057 0.02104 0.78849 2.24219
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.4295 0.6554
## Residual 2.1316 1.4600
## Number of obs: 160, groups: id, 16
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.6188 0.2834 -2.183
## Conditioncond.2 0.9640 0.4008 2.405
##
## Correlation of Fixed Effects:
## (Intr)
## Condtncnd.2 -0.707
We may divide the “raw coefficient” by either the total SD, or only by the estimated between-subject (id) SD:
→ For total variance/SD: 0.96 / sqrt(0.662 + 1.462) = 0.60
→ For estimated between-subject SD only: 0.96 / 0.66 = 1.47
Let’s model a discrete count variable such as reading errors. Age is the predictor. Reading errors monotonously decrease throughout primary school. However, decrease is not linear: it smoothly converges towards zero, and M and SD are related.
We had already talked about the problem here: https://www.memoryandlearninglab.it/wp-content/uploads/2023/10/glm_e_overdispersion3.html
The mean linear decrease is about --0.50 errors per year, so every two years we should observe a decrease of about -1.00 reading errors.
However… from 6 to 7 years we get an expected decrease of -1.37 reading errors, whereas from 7 to 8 we get an expected decrease of -0.75 reading errors.
So, what remain constant? What can be reported as a meaningful effect size? It’s the percentage of reduction of reading errors per time unit.
Every +1 year, the expected number of remaining reading errors is 55% compared to the expected number of the previous year. After +2 years, the expected number of remaining reading errors is 30% compared to the previous observation. These decreases are constant over time.
How do we get these estimates? Let’s have a look at the Poisson model summary:
summary(fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: errors ~ age + (1 | id)
## Data: df
##
## AIC BIC logLik deviance df.resid
## 1160.3 1173.0 -577.2 1154.3 497
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4797 -0.6413 -0.4043 0.5184 3.7210
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.05813 0.2411
## Number of obs: 500, groups: id, 500
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.72878 0.31624 14.95 <2e-16 ***
## age -0.60363 0.04138 -14.59 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## age -0.984
The estimate for age
is
-0.60.
The effect on the linear scale per +1 year is
exp(age)
that is 0.55, so
the percentage is 55% (i.e., remaining percentage of
reading errors after every year).
Per +2 years we calculate
exp(age*2)
, that is 0.30,
thus a remaining percentage of 30%.
Let’s model a discrete sum score variable measuring solved math problems out of 15 in a task. Accuracy ranges from 0% (sum score = 0) to 100% (sum score = 15). Age is the predictor. Accuracy monotonously increase throughout primary school. However, increase is not linear: it is constrained between two extreme bounds, and “fastest” in the middle, and once again M and SD are related.
The linear increase per year is about +3.71 correctly solved math problems… this is not bad, but clearly inaccurate when close to the bounds.
A better estimate is the Odds Ratio, which here is 11.50. This is an appropriate effect size index for binomial regressions. But how is it interpreted?
Every +1 year of age, the odds of correctly solving a problem is 11.50 times the odds of the year before.
How do we get this estimate? Let’s have a look at the Binomial model summary:
summary(fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: cbind(sumscore, 15 - sumscore) ~ age + (1 | id)
## Data: df
##
## AIC BIC logLik deviance df.resid
## 1712.5 1725.1 -853.2 1706.5 497
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.08730 -0.34507 0.07611 0.30624 1.01348
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 2.787 1.669
## Number of obs: 500, groups: id, 500
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -20.10317 0.81904 -24.55 <2e-16 ***
## age 2.44249 0.09836 24.83 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## age -0.993
The estimate for age
is
2.44.
The Odds Ratio is
exp(age)
that is
11.50.
Note that the Odds Ratio depends on the “age”
metrics, which is expressed in years: with age in months, we would get a
different estimate. Specifically, it would be 11.50^(1/12)
= 1.23