The between-subjects design introduced in the previous tutorial is also called completely randomized design (see Kirk, 2013), as the subjects are randomly assigned to one of the experimental conditions. In other words, the individual differences of subjects are regarded as pure random errors. In contrast to the completely randomized design, the randomized block design (see Kirk, 2013) considers the common characteristics of subjects as a source of the variability of response scores, although it is not manipulated by the experimenters as done for the independent variable. Thus, different subjects can be grouped as a block based on their common characteristics. The extreme case in the block randomized design is the within-subjects design, in which each subject hiself or herself is a block. Let us discuss the within-subjects design first.
In the within-subjects design, the subject variable is also considered as a source of the variabity of the response scores. Therefore, the effect of the subject variable on the dependent variable should be included in the linear model to predict each observation. There are two approaches to making the linear model.
The first is the additive model, in which the block (or subject) effect is thought to be independent of the treatment effect. That is, each subject’s score can be the sum of the grand mean \(\bar{x}_{..}\), the treatment effect \(\tau_j\), the block effect \(\pi_i\), and the sampling error \(\epsilon_{ij}\). Thus, the linear model is \(x_{ij}=\bar{x}_{..}+\tau_j+\pi_i+\epsilon_{ij}\) or \(x_{ij}-\bar{x}_{..}=\tau_j+\pi_i+\epsilon_{ij}\). Based on this equation, the sum of squares of total deviation in the observed scores \(SST\) can be partitioned as three orthogonal parts, which are \(SS_{treat}\), \(SS_{block}\), and \(SS_{error}\) respectively. That is, \(SST=SS_{treat}+SS_{block}+SS_{error}\). The same partitioning exists for the degrees of freedom, namely \(df_{Total}=df_{treat}+df_{block}+df_{error}\) \(kn-1=(k-1)+(n-1)+(kn-k-n+1)=(k-1)+(n-1)+(k-1)(n-1)\), where \(k\) is the number of treatment levels and \(n\) is the number of blocks.
Following the logic of ANOVA, the mean square of the treatment effect \(MS_{treat}\) as well as the mean square of the block effect \(MS_{block}\) should be always larger than or equal to the mean square of the error \(MS_{error}\), which is the unbaised estimate of population variance. Thus,
\[\begin{align*} E(MS_{treat})&=\sigma^2_{\epsilon}+n\theta^2_{\tau}\\ E(MS_{block})&=\sigma^2_{\epsilon}+k\theta^2_{\pi}\\ E(MS_{\epsilon})&=\sigma^2_{\epsilon}. \end{align*}\]Therefore, to test the treatment effect \(F=\frac{MS_{treat}}{MS_{error}}\) and so is the block effect \(F=\frac{MS_{block}}{MS_{error}}\).
The below shows a set of fictitious data for an experiment testing 8 participants in all 4 conditions. In this experiment, every participant was tested in every condition. We have reason to believe that the individual differences of participants might influence their performance. Thus, the participants are treated as a blocking variable.
dta<-data.frame(s=gl(8,1,32,labels=1:8),
a=factor(rep(1:4,each=8)),
y=c(3,6,3,3,1,2,2,2,
4,5,4,3,2,3,4,3,
7,8,7,6,5,6,5,6,
7,8,9,8,10,10,9,11))
Let us plot the means of these conditions. It looks like the performance is getting better as the level of the independent variable increases.
means<-with(dta,tapply(y,a,mean))
nums<-with(dta,tapply(y,a,length))
ses<-with(dta,tapply(y,a,sd))/sqrt(nums)
library(ggplot2)
exp1.fg<-ggplot(data=data.frame(x=1:4,y=means,ses=ses),
aes(x=x,y=y))+
geom_line()+geom_point(shape=1,col="red",size=2)+
geom_errorbar(aes(ymin=y-ses,ymax=y+ses),width=0.1)
exp1.fg
Let us assume that the participant variable would not interact with the experimental treatment. Thus, the additive model is used. Apparently, in the below formula declared in aov(), there are only main effects to test. One is the treatment effect and the other is the participant effect. According to the previous inference about the expected values of \(MS_{error}\), \(MS_{treat}\), and \(MS_{block}\), the treatment effect and the block effect are tested by \(F\) test with the error term as \(MS_{error}\). The results show that the treatment effect is significant, however, there is no significant difference between the participants. In normal situations, the subject variable is always regarded as the random errors brining no systematic effect to the response scores. Therefore, it is good to know that the participants have no effect to the dependent variable.
exp1_add.aov<-aov(y~a+s,dta)
summary(exp1_add.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## a 3 194.5 64.83 47.772 1.48e-09 ***
## s 7 12.5 1.79 1.316 0.291
## Residuals 21 28.5 1.36
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Following the omnibus \(F\) test, the post hoc test, the test for contrasts, or the trend analysis can be conducted. Which of these analyses should be conducted depends on how you want to explain your data. In an experiment, however, we normally have explicit hypotheses to verify. Thus, the post hoc test to me is not that necessary. Suppose in this example, the independent variable is a continuous variable. With the visual inspection of the figure for the mean scores of the four conditions, a linear or quadratic trend is expected. Thus, I choose to do the trend analysis. The results reveal that only the linear and quadratic trends are significant. Normally, we will report the trend of the higher degree. Therefore, the treatment affects the dependent variable in a quadratic trend.
Since these contrasts are orthogonal to each other, the sum of squares of the corresponding effects can be summed to the sum of square of the treatment effect. Of course, the error term in the \(F\) tests is still \(MS_{error}\).
dta$c1<-rep(c(-3,-1,1,3),each=8)
dta$c2<-rep(c(1,-1,-1,1),each=8)
dta$c3<-rep(c(-1,3,-3,1),each=8)
summary(aov(y~c1+c2+c3+s,dta))
## Df Sum Sq Mean Sq F value Pr(>F)
## c1 1 184.9 184.90 136.242 1.21e-10 ***
## c2 1 8.0 8.00 5.895 0.0243 *
## c3 1 1.6 1.60 1.179 0.2899
## s 7 12.5 1.79 1.316 0.2914
## Residuals 21 28.5 1.36
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
When the treatment effect is thought to be interacted with the subject effect, each observed score can be described by the nonadditive model, which is \(x_{ij}=\bar{x}{..}+\tau_j+\pi_i+\tau\pi_{ij}+\epsilon_{ij}\). This identities can be rewritten as
\[\begin{align*} x_{ij}-\bar{x}{..}&=\tau_j+\pi_i+\tau\pi_{ij}+\epsilon_{ij}\\ &=(\bar{x}_{.j}-\bar{x}_{..})+(\bar{x}_{i.}-\bar{x}_{..})+\bar{x}_{ij}+\bar{x}_{..}-\bar{x}_{.j}-\bar{x}_{i.}+(x_{ij}-\bar{x}_{ij})\\ &=(\bar{x}_{.j}-\bar{x}_{..})+(\bar{x}_{i.}-\bar{x}_{..})+(\bar{x}_{ij}+\bar{x}_{..}-\bar{x}_{.j}-\bar{x}_{i.}). \end{align*}\]Since in each condition \(j\), there is only one subject, \(x_{ij}=\bar{x}_{ij}\). The first two terms in the right side of this identities represent for the effects. The third term in the right side of this identities represents for the residual. Recall that in a \(p\times q\) factorial design, the deviation between each condition mean and the grand mean is caused by the main effects of the two factors as well as their interaction effect. Thus, the interaction effect \(\alpha\beta_{jk}\) is normally computed as \(\bar{x}_{.jk}-\bar{x}_{...}-\alpha_j-\beta_k\). Now come back to the nonadditive model. Presumably, the interaction effect between the treatment and subject variable should be computed as \(\bar{x}_{ij}-\bar{x}_{..}-\tau_{j}-\pi_{i}\). As
\[\begin{align*} \tau_{j}&=\bar{x}_{.j}-\bar{x}_{..}\\ \pi_{i}&=\bar{x}_{i.}-\bar{x}_{..}, \end{align*}\]The interaction effect becomes \(\bar{x}_{ij}-\bar{x}_{..}-\bar{x}_{.j}+\bar{x}_{..}-\bar{x}_{i.}+\bar{x}_{..}=\bar{x}_{ij}+\bar{x}_{..}-\bar{x}_{.j}-\bar{x}_{i.}\). This is exactly the residual in the nonlinear model. Therefore, the interaction effect between the treatment and subject variables is embedded in the error. This is held true for the partitioning of the degrees of freedom. Suppose there are \(n\) subjects and \(p\) conditions. The total degrees of freedom \(pn-1\) can be decomposed as \((p-1)+(n-1)+(p-1)(n-1)\). The first two \(df\)s are for the two main effects and the last one is for the error term or the interaction effect.
In normal experiments, the treatment variable is always assumed to have a fixed effect, whereas the subject variable is assumed to have a random effect. For a fixed-effect variable, the sum of the effects at all levels is 0, \(\sum_j\alpha_j=0\). For a random-effect variable, the sum of the effects is a parameter. Therefore, the mean squares of the sources in the nonlinear model are list as follows.
\[\begin{align*} E(MS_{treat})&=\sigma_{\epsilon}^2+\theta^2_{\tau\pi}+n\theta^2_{\tau}\\ E(MS_{subject})&=\sigma_{\epsilon}^2+p\theta^2_{\pi}\\ E(MS_{treat/subject})&=\sigma_{\epsilon}^2+\theta^2_{\tau\pi} \end{align*}\]Apparently, to test the main effect of treatment is to compare \(MS_{treat}\) and \(MS_{treat/subject}\) with \(F\) test. However, the subject effect cannot be tested in this way, due to the fact that the error term now is not simply \(\sigma^2_{\epsilon}\), but mixed with the interaction effect.
In R, the default error term in aov() does not include the effect of subject variable. Thus, we need to define by ourselves the error term in the formula in aov() by adding Error() in the formula. With the data of Example 1, we can conduct a one-way within-subjects ANOVA with the below code. Note that in this formula, there is only one independent variable a and the error term is defined by Error(s/a). Note that the subject variable should be the nominator and the treatment variable should be the denominator. As shown in the results, the main effect of a is significant. The terms \(MS\), \(SS\), \(df\), \(F\) value, and \(p\) value are all the same as those in the summary table for ANOVA with the additive model. The only difference is that the subject variable is not tested here. This is because the nominator in Error() is always assumed to be a random-effect variable.
summary(aov(y~a+Error(s/a),dta))
##
## Error: s
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 7 12.5 1.786
##
## Error: s:a
## Df Sum Sq Mean Sq F value Pr(>F)
## a 3 194.5 64.83 47.77 1.48e-09 ***
## Residuals 21 28.5 1.36
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The trend analysis can be conducted with the nonlinear model as well. As shown in the below code, the three orthogonal coefficient vectors (c1, c2, and c3) now substitute the independent variable (a). Thus, the denominator in Error() should be changed correspondingly. The conclusion of the trend analysis is the same as that in the last time. However, the \(F\) values are changed now. This is all because that each contrast now has its own error term, which can be viewed as ther interaction effect between the subject variable and the speicific contrast. Therefore, the error terms for testing different contrasts would be different. Since the \(df\) for a contrast is always 1, the \(df\) for the error term for testing a contrast is \(n-1\) (i.e., 7 in this example) at all times.
summary(aov(y~c1+c2+c3+Error(s/(c1+c2+c3)),dta))
##
## Error: s
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 7 12.5 1.786
##
## Error: s:c1
## Df Sum Sq Mean Sq F value Pr(>F)
## c1 1 184.9 184.90 67.06 7.85e-05 ***
## Residuals 7 19.3 2.76
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: s:c2
## Df Sum Sq Mean Sq F value Pr(>F)
## c2 1 8 8.000 11.2 0.0123 *
## Residuals 7 5 0.714
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: s:c3
## Df Sum Sq Mean Sq F value Pr(>F)
## c3 1 1.6 1.6 2.667 0.146
## Residuals 7 4.2 0.6
In the textbook of Kirk (2013) for experimental design, the within-subjects design is the special case of the generalized randomized block design. In the generalized randomized block design, subjects are grouped as blocks according to their similarities on some subject features, such as age, IQ, place of residence, and so on. Thus, in addition to the experimental treatment, the block of subjects is thought to be another factor to influence the dependent variable. When there is only one subject in each block, of course this is the within-subjects design.
The conditions that the generalized randomized block design must follow are as follows.
One treatment with \(p\geq2\) treatment levels and one nuisance variable with \(w\geq2\) levels.
Formation of \(w\) groups each containing \(np\) homogeneous experimental units (e.q., participants) where \(n\geq2\).
Random assignment of treatment levels to the \(np\) units in each group so that each treatment level is assigned to exactly \(n\) units. The design requires \(w\) sets of \(np\) homogeneous experimental units and a total of \(N=npw\) units.
The linear model of the generalized randomized block design is the same nonadditive model of the within-subjects design, that is \(x_{ij}=\bar{x}_{..}+\tau_j+\pi_i+\tau\pi_{ij}+\epsilon_{ij}\). Different from the within-subjects design, the interaction effect between treatment and group can be tested now. This also means that the interaction effect can be teared appart from the error. With the treatment and group variables respectively being set as fixed and random effects, the variances of data attributed to different sources are listed as follows. Let us see how we analyze these effects in the following example.
\[\begin{align*} E(MS_{treat}) & =\sigma^2_{\epsilon}+n\theta^2_{\tau\pi}+nw\theta_\tau^2\\ E(MS_{group}) & =\sigma^2_{\epsilon}+np\theta^2_{\pi}\\ E(MS_{treat\times group}) & =\sigma^2_{\epsilon}+n\theta^2_{\tau\pi}\\ E(MS_{error}) & =\sigma^2_{\epsilon} \end{align*}\]The degrees of freedom for each source are listed as follows.
\[\begin{align*} df_T&=npw-1\\ df_{treat}&=p-1\\ df_{group}&=w-1\\ df_{treat\times group}&=(p-1)(w-1)\\ df_{error}&=pw(n-1) \end{align*}\]Suppose in an experiment, we have 8 participants who can be divided to 4 groups (n=2 in each group). Now in each cell for one level of treatment and one level of grouping variable, there are 2 participants. See the data as below.
dta<-data.frame(a=gl(4,8,32),
g=gl(4,2,32,labels=c("g1","g2","g3","g4")),
y=c(3,6,3,1,2,2,3,2,
4,5,4,2,3,4,3,3,
7,8,7,5,6,5,6,6,
7,8,9,10,10,9,8,11))
As usual, we can plot the mean scores across all treatment levels and subject groups. It looks like that the overall performance increases as the treatment level increases. Also, the subjects of group 1 seem to perform differentially from the other groups.
library(reshape)
means<-with(dta,tapply(y,list(g,a),mean))
nums<-with(dta,tapply(y,list(g,a),length))
ses<-with(dta,tapply(y,list(g,a),sd))/sqrt(nums)
ave.dta<-data.frame(cbind(melt(means),melt(ses)[,3]))
names(ave.dta)<-c("group","x","y","ses")
exp2.fg<-ggplot(data=ave.dta,aes(x=x,y=y,fill=group))+
geom_line(aes(col=group))+geom_point(aes(col=group))+
geom_errorbar(aes(ymin=y-ses,ymax=y+ses,col=group),width=0.3,
position=position_dodge(0.025))
exp2.fg
According to the expected mean square values, the interaction effect is the error term for testing the treatment effect, whereas \(MS_{error}\) is the error term for testing all other effects. This means we have to define the error terms by ourselves. First, let us test the treatment effect. Since the error term is the interaction effect not the mean square of errors, we use Error(g/a) to define the error term. The formula in the below code is acutally the one used for conducting a one-way within-subjects ANOVA. The restul shows that the treatment effect is significant. Note that the \(df\) of the error term is 9, which is actually the \(df\) of the interaction effect.
# Test the treatment effect
summary(aov(y~a+Error(g/a),dta))
##
## Error: g
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 3 4.25 1.417
##
## Error: g:a
## Df Sum Sq Mean Sq F value Pr(>F)
## a 3 194.50 64.83 32.87 3.53e-05 ***
## Residuals 9 17.75 1.97
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 16 19 1.188
Second, we can test the other effects with the default error term, namely \(MS_{error}\). Note the formula used here is the same one for conducting a two-way between-subjects ANOVA. In the below summary table, only the effect of group and the interaction effect are our concern. These two effects are not significant. Check the \(df\) of the error term. It is \(16=4\times4\times(2-1)\), just the \(df\) for errors in the above \(df\) table. The results show that none of the group effect and the interaction effect with group involved is significant. This is good for establish the external validity of this experiment.
# Test the group effect and the interaction effect
summary(aov(y~a*g,dta))
## Df Sum Sq Mean Sq F value Pr(>F)
## a 3 194.50 64.83 54.596 1.26e-08 ***
## g 3 4.25 1.42 1.193 0.344
## a:g 9 17.75 1.97 1.661 0.180
## Residuals 16 19.00 1.19
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The data created with the below codes are collectted from 15 male and 15 female subjects, each of which took all three experimental conditions. If we do not care about the individual differences among subjects and only focus on the treatment effect (fixed-effect) and the gender effect (fixed-effect), how are you gonna analyze these data? What are the results? Also, if the data are not collected from subjects of different genders, but different grades (e.g., year 1 and year 2 college students), then how are you gonna analyze the treatment effect, the grade effect (random effect), and their interaction effect? What are the results?
dta<-data.frame(g=gl(2,15,30),
a=gl(3,5,30),
y=c(13,11,8,10,12,
8,9,12,7,11,
6,9,11,12,8,
10,8,11,14,13,
9,8,8,10,5,
11,7,8,4,8))