Mixed Effects Model

In a standard linear regression, the regression model normally can be written as \(\hat{y}=b_0+b_1x\), where \(b_0\) and \(b_1\) are the intercept and slope of a line in the \(x-y\) space. In this case, we assume that there is no particular structure among our data. However, it is not rare that our data may be separated to multiple groups. The intercept or/and the slope may vary through these groups. The smallest group can be a single person. For example, we collect data from \(n\) subjects for \(k\) days. The estimated intercept and slope in a standard regression can be viewed as the mean intercept and mean slope across \(n\) subjects. In order to better cover the variability of the intercept and slope among subjects, we need to take into consideration the individual differences on these two parameters.

For instance, the intercept can be sensitive to subjects. Thus, the regression model turns out to be \(\hat{y}=b_{0,i}+b_1x\), where \(b_{0,i}\) is the estimated intercept for subject \(i\). The subject effect on intercept is not fixed, but is assumed as a distribution \(e_{subject}\sim N(0,\tau)\), where \(b_{0,i}=b_0+e_{subject}\). Thus, the subject effect on intercept is called the random effect. The intercept \(b_0\) is called the fixed effect. Similarly, the slope effect can be the random effect also, when the specificity of each subject is considered.

Example 1

Let’s see the data of Risk and Sammarco (1991), in which the density of coral skeleton was measured 3 times at each of 9 specific locations. These authors reported that the density of coral skeleton can be predicted by the distance to shore of the sampling location in a polynomial regression model. However, it was evident in our previous tutorial that a simple linear regression model does not perform worse than the polynomial regression model. Thus, we though that the relationship between Density and Distance can be described by a linear regression model.

However, in this tutorial, we are going to show that a mixed effects model is actually suitable for this data. A mixed effects model incorporate the fixed effect, which is estimated for the entire population and the random effect, which is estimated for an experimental unit randomly sampled by experimenters. An experimental unit can be a single subject or a group of subjects sharing a same feature, such as gender. Normally, the grouping variable is a covariate, which needs to be controlled in experimental design, in order to get clearer the relationship between the dependent variable and the independent variable (or predictor).

Firstly, it is always better to check the descriptive statistics of a data via some visualization tools. The scatter plot shows a positive relationship between Distance and Density. The boxplot suggests that the specificity of location should not be precluded. The histogram of Density shows that Density is a normal distribution.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
url<-"http://cml.nccu.edu.tw/lxyang/ExpDesign/rwg179.txt"
coral.dta<-read.table(url,header=T)
head(coral.dta)
##   Sample       Reef Distance Density
## 1      1 MiddleReef      3.5   1.337
## 2      2 MiddleReef      3.5   1.216
## 3      3 MiddleReef      3.5   1.309
## 4      4    AlmaBay     14.3   1.053
## 5      5    AlmaBay     14.3   1.082
## 6      6    AlmaBay     14.3   1.079
coral.fg1<-coral.dta %>% ggplot(aes(Distance,Density))+
                          theme_bw()+
                          geom_point(shape=1)+
                          geom_smooth(method="lm",se=F)
coral.fg2<-coral.dta %>% ggplot(aes(Reef,Density))+
                          theme_bw()+
                          geom_boxplot()+
                          theme(axis.text.x=element_text(angle=90))
coral.fg3<-coral.dta %>% ggplot(aes(Density))+
                          theme_bw()+
                          geom_histogram(fill="gray60",color="black",bins=10)
grid.arrange(grobs=list(coral.fg1,coral.fg2,coral.fg3),
             widths=c(1,1),layout_matrix=rbind(c(1,1),c(2,3)))
## `geom_smooth()` using formula = 'y ~ x'

We firstly fit a regression model with intercept as the only parameter, namely \(y_i=b_0+\epsilon_i\), where \(\epsilon\sim N(0,\sigma)\). The funciton lm(Density~1,coral.dta) means that the intercept is the only paramter to be estimated. Here, 1 in R means intercept. This model actually computes the mean density across all data in the Density column. The estimated intercept is actually the mean of all Density data.

coral.m0lm<-lm(Density~1,coral.dta)
summary(coral.m0lm)
## 
## Call:
## lm(formula = Density ~ 1, data = coral.dta)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28367 -0.06467  0.03833  0.09783  0.25233 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.33667    0.02549   52.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1324 on 26 degrees of freedom
with(coral.dta,mean(Density))
## [1] 1.336667

Since the specifity of each location cannot be ignored, we directly estimate the mean density at each location. In the equation of the second regression model, -1 means that the intercept should not be estimated. The estimated means for these 9 locations can be seen in the summary table. The standard deviation of residual (\(\sigma\)) in the first model is 0.1324 and it drops to 0.04528 in the second model. It is suggested that it is better to estimate the coral skeleton density at each location separately. However, the degrees of freedom is 18 (i.e., \(27-9=18\)) in the second model, which is far less than the degrees of freedom in the first model (i.e., 26). Thus, estimating each group mean is not a good solution.

coral.m0lm1<-lm(Density~Reef-1,coral.dta)
summary(coral.m0lm1)
## 
## Call:
## lm(formula = Density ~ Reef - 1, data = coral.dta)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.071333 -0.031000  0.007667  0.019500  0.093333 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## ReefAlmaBay           1.07133    0.02614   40.98   <2e-16 ***
## ReefBowdenReef        1.41200    0.02614   54.01   <2e-16 ***
## ReefGreatPalmIs.      1.35533    0.02614   51.84   <2e-16 ***
## ReefGrubReef          1.49567    0.02614   57.21   <2e-16 ***
## ReefLittleBroadhurst  1.41900    0.02614   54.28   <2e-16 ***
## ReefMiddleReef        1.28733    0.02614   49.24   <2e-16 ***
## ReefMorindaShoals     1.46600    0.02614   56.08   <2e-16 ***
## ReefOrpheusIs.        1.24167    0.02614   47.49   <2e-16 ***
## ReefPandoraReef       1.28167    0.02614   49.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04528 on 18 degrees of freedom
## Multiple R-squared:  0.9992, Adjusted R-squared:  0.9989 
## F-statistic:  2637 on 9 and 18 DF,  p-value: < 2.2e-16

Since each location has its own mean density, we can regard the mean at each location as the grand mean of all density data plus an effect brought by that location. That is, \(y_{ij}=b_{0j}+\epsilon_{i}=b_0+(b_{0j}-b_0)+\epsilon_{ij}\). Here, \(b_{0j}-b_0\) is called the random effect of Reef on the intercepts (i.e., mean densities at locations), which is assumed to be a normal distribution centered at 0, \(N(0,\sigma_b)\). We can use the function lmer( ) in {lme4} package. In the below codes, Density~1 means \(y=b_0+\epsilon\) and (1|Reef) means the random effect of Reef on the intercept, namely \(b_{0j}-b_0\). Thus, the formula below mean s \(y=b_0+b_{0j}-b_0+\epsilon=b_{0j}+\epsilon\). In this model, there are in total three parameters to estimate, \(b_0\), \(\sigma_b\), and \(\sigma\), far less than 9 parameters in the model, which separately estimates the mean density at each location. However, the estimated residual SD \(\sigma\) is still the same 0.045. Thus, taking the random effects into consideration can help prevent the inflation of residual variance.

One thing that you may find strange is there is no \(p\) value in this summary table. For some reasons, the authors of {lme4} do not prefer reporting \(p\) values. At least, this is understandable for random effects. However, you can use the function tab_model( ) in {sjPlot} package to estimate the \(p\) values. In R, if you want to call a function without loading the relevant package in advance, you can use put command in this format Package::Function( ). In this table, you will find the estimated intercept which is the same as the one estimated in the model without random effects. The intercept here is \(b_0\). Also, the 95% confidence interval and \(p\) value are presented. The \(\sigma^2\) is the residual variance and \(\pi_{00Reef}\) is the variance of random effect, namely \(\sigma^2_b\). ICC is computed as \(\frac{\sigma^2_b}{\sigma^2_b+\sigma^2}\), showing how much of total variance is explained by the random effect. In this example, ICC = 0.878. That is, most of total variance comes from the between-group variance. ICC is also called the intraclass correlation: the larger it is, the necessity to include random effects is stronger. The last two indices are Marginal \(R^2\) and Conditional \(R^2\). The first is the proportion of total variance explained by the fixed effect, which is nearly 0 in this example, suggesting that the total variance in data is better explained by the random effect of Reef than the fixed effect of the mean density of coral skeleton. The Conditional \(R^2\) is the \(R^2\) of this model.

library(lme4)
## Loading required package: Matrix
coral.m0lmer<-lmer(Density~1+(1|Reef),REML=F,data=coral.dta)
summary(coral.m0lmer)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Density ~ 1 + (1 | Reef)
##    Data: coral.dta
## 
##      AIC      BIC   logLik deviance df.resid 
##    -56.4    -52.5     31.2    -62.4       24 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.62329 -0.61879 -0.02246  0.44149  2.21578 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Reef     (Intercept) 0.01484  0.12181 
##  Residual             0.00205  0.04528 
## Number of obs: 27, groups:  Reef, 9
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  1.33667    0.04153   32.19
sjPlot::tab_model(coral.m0lmer)
  Density
Predictors Estimates CI p
(Intercept) 1.34 1.25 – 1.42 <0.001
Random Effects
σ2 0.00
τ00 Reef 0.01
ICC 0.88
N Reef 9
Observations 27
Marginal R2 / Conditional R2 0.000 / 0.879

The inspection of the scatter plot suggests a positive correlation between Distance and Density. Now we are testing the regression model with Distance as the predictor. First, we use lm( ) to fit the standard regression model to the data. The results show that the intercept and slope are both significantly different from 0. The residual analysis reveals that the residuals are all negative for some locations but all positive for some others. This suggests a random effects structure.

coral.m1lm<-lm(Density~Distance,data=coral.dta)
summary(coral.m1lm)
## 
## Call:
## lm(formula = Density ~ Distance, data = coral.dta)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.212753 -0.047247 -0.009136  0.062975  0.169705 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.2119729  0.0324376  37.363  < 2e-16 ***
## Distance    0.0037609  0.0007954   4.728 7.54e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09813 on 25 degrees of freedom
## Multiple R-squared:  0.4721, Adjusted R-squared:  0.4509 
## F-statistic: 22.35 on 1 and 25 DF,  p-value: 7.54e-05
coral.dta %>% mutate(residual=coral.m1lm$residuals) %>%
              ggplot(aes(Reef,residual))+theme_bw()+geom_boxplot()+
              theme(axis.text.x=element_text(angle=90))

Thus, the random effect from location should not be ignored. Although in the results of regression analysis, the \(p\) values for the intercept and slope are both smaller than 0.05, the estimated slope is 0.004 suggesting a weak effect of Distance. We can run a linear mixed model with Distance and Reef as predictors, including the random effect of Reef on the intercept for consideration as well. It is implied that after controlling the random effects of Reef, Distance cannot predict Density anymore. Therefore, a reasonable conclusion for the coral skeleton data should be that Distance dose not matter to the density of coral skeleton, but location does. The ICC is 0.77, showing a high intraclass correlation. Also, we check the residuals of this model and find that the variability of residual gets smaller for each location than before.

coral.m1lmer<-lmer(Density~1+Distance+(1|Reef),REML=F,data=coral.dta)
summary(coral.m1lmer)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Density ~ 1 + Distance + (1 | Reef)
##    Data: coral.dta
## 
##      AIC      BIC   logLik deviance df.resid 
##    -60.9    -55.7     34.4    -68.9       23 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4510 -0.7118 -0.1532  0.4793  2.0682 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Reef     (Intercept) 0.006866 0.08286 
##  Residual             0.002050 0.04528 
## Number of obs: 27, groups:  Reef, 9
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 1.211973   0.049746  24.363
## Distance    0.003761   0.001220   3.083
## 
## Correlation of Fixed Effects:
##          (Intr)
## Distance -0.813
sjPlot::tab_model(coral.m1lmer)
  Density
Predictors Estimates CI p
(Intercept) 1.21 1.11 – 1.31 <0.001
Distance 0.00 0.00 – 0.01 0.005
Random Effects
σ2 0.00
τ00 Reef 0.01
ICC 0.77
N Reef 9
Observations 27
Marginal R2 / Conditional R2 0.481 / 0.881
plot(coral.m1lmer,Reef~resid(.),abline=0)

Example 2

Let’s check another example. The data set here contains the country name, the birth rate, mortality rate, the modern index, and so on for 30 countries. We are particularly interested in the relationship between the birth rate and the mortality rate. Thus, we make a scatter plot first. It looks like that the mortality rate can predict the birth rate; the larger the mortality rate, the larger birth rate. Does it imply that governments would encourage couples to give birth to children in order to fight against the mortality rate? Before we jump into this conclusion, we need to re-check this data set.

url2<-"http://cml.nccu.edu.tw/lxyang/ExpDesign/birth1.txt"
br.dta<-read.table(url2,header=T)
br.dta %>% ggplot(aes(mr,br))+theme_bw()+geom_point(shape=1)+
           geom_smooth(method="lm",se=F,color="red",linetype=2)
## `geom_smooth()` using formula = 'y ~ x'

br.m0lm<-lm(br~mr,br.dta)
summary(br.m0lm)
## 
## Call:
## lm(formula = br ~ mr, data = br.dta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.583  -4.995  -1.893   5.131  18.190 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.43237    2.75145   4.882 3.83e-05 ***
## mr           0.21574    0.04587   4.703 6.25e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.285 on 28 degrees of freedom
## Multiple R-squared:  0.4413, Adjusted R-squared:  0.4214 
## F-statistic: 22.12 on 1 and 28 DF,  p-value: 6.245e-05

Perhaps, some random effect structure is concealed under the scatter pattern. One reasonable guess is whether the modernization of a country would influence the relationship between birth rate and mortality rate. The linear regression line is quite flat for no matter whether the countries are modernized or not. This suggests that the strong correlation revealed by all data may be a pseudo trend between birth rate and mortality rate.

br.dta %>% ggplot(aes(mr,br))+theme_bw()+geom_point(shape=1)+
           geom_smooth(method="lm",se=F,color="deepskyblue2")+
           facet_wrap(~modern)
## `geom_smooth()` using formula = 'y ~ x'

Now we fit a mixed effects model to the data, with modern as the grouping variable. It looks like that the two groups differ on the intercept. Thus, we firstly add the random effect of modern to the intercept. This model performs better than the one without a random effect with a smaller AIC.

br.dta$modern<-factor(br.dta$modern)
br.m1lmer<-lmer(br~mr+(1|modern),REML=F,data=br.dta)
AIC(logLik(br.m0lm))
## [1] 208.2187
AIC(logLik(br.m1lmer))
## [1] 203.8681

We can also check whether or not adding a random effect to slope can increase \(R^2\) effectively. The results show that there is no significant. Note that the np (number of parameters) for the first model is 4, including the coefficients for the intercept (\(b_0\)) and slope (\(b_1\)), the residual variance (\(\epsilon\)), and the variance for random intercept (\(\sigma_{b0}\)). However, np is 6 for the second model, which includes the coefficients for the intercept (\(b_0\)) and slope (\(b_1\)), variances for residuals (\(\epsilon\)), random intercept (\(\sigma_{b0}\)) and slope (\(\sigma_{b1}\)), and the covariance between random effects of intercept and slope (\(cov_{b0b1}\)).

br.m2lmer<-lmer(br~mr+(1+mr|modern),REML=F,data=br.dta)
## boundary (singular) fit: see help('isSingular')
anova(br.m1lmer,br.m2lmer)
## Data: br.dta
## Models:
## br.m1lmer: br ~ mr + (1 | modern)
## br.m2lmer: br ~ mr + (1 + mr | modern)
##           npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## br.m1lmer    4 203.87 209.47 -97.934   195.87                     
## br.m2lmer    6 207.84 216.24 -97.918   195.84 0.0313  2     0.9845

Therefore, we choose the model incorporating random intercepts as the mixed effects model for this data. This time, the slope is not significantly different from 0, and the marginal \(R^2\) is pretty small, suggesting that the fixed effect dose not account for the variance in the data too much. Thus, we know after controlling the random effect on the intercept of the grouping variable, the mortality cannot predict the birth rate anymore, hence supporting our hypothesis that the correlation between birth rate and mortality rate may be a fake trend when we combine two groups of countries.

summary(br.m1lmer)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: br ~ mr + (1 | modern)
##    Data: br.dta
## 
##      AIC      BIC   logLik deviance df.resid 
##    203.9    209.5    -97.9    195.9       26 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.66504 -0.48086 -0.03935  0.38288  1.90818 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  modern   (Intercept) 58.15    7.626   
##  Residual             32.27    5.681   
## Number of obs: 30, groups:  modern, 2
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 26.59633    6.66898   3.988
## mr           0.02189    0.06017   0.364
## 
## Correlation of Fixed Effects:
##    (Intr)
## mr -0.564
sjPlot::tab_model(br.m1lmer)
  br
Predictors Estimates CI p
(Intercept) 26.60 12.89 – 40.30 <0.001
mr 0.02 -0.10 – 0.15 0.719
Random Effects
σ2 32.27
τ00 modern 58.15
ICC 0.64
N modern 2
Observations 30
Marginal R2 / Conditional R2 0.005 / 0.645