Often psycologists would test participants in an experimental design consisting of at least one between-subjects variable and one within-subjects variable. For example, King (1986) wanted to know whether the acquired tolerance could be explained on the basis of a conditioned tolearance related to the physical context in which the drug was administered. Three groups of rats were tested. The “Control” group was injected with physiological saline. The “Same” group was injected with the drug in the same environment. The “Different” group was injected with the drug in a different environment. Because the drug is known to be metabolized over a period of approximately 1 hour, King recored his data in 5-min intervals. The below shows the data.
dta<-data.frame(s=factor(c(rep(1:8,6),rep(9:16,6),rep(17:24,6))),
Int=gl(6,8,144),
Con=gl(3,48,144,labels=c("Control","Different","Same")),
y=c(150,335,149,159,159,292,297,170,
44,270,52,31,0,125,187,37,
71,156,91,127,35,184,66,42,
59,160,115,212,75,246,96,66,
132,118,43,71,71,225,209,114,
74,230,154,224,34,170,74,81,
346,426,359,272,200,366,371,497,
175,329,238,60,271,291,364,402,
177,236,183,82,263,263,270,294,
192,76,123,85,216,144,308,216,
239,102,183,101,241,220,219,284,
140,232,30,98,227,180,267,255,
282,317,362,338,263,138,329,292,
186,31,104,132,94,38,62,139,
225,85,144,91,141,16,62,104,
134,120,114,77,142,95,6,184,
189,131,115,108,120,39,93,193,
169,205,127,169,195,55,67,122))
dim(dta)
## [1] 144 4
The subject ids are coded as from 1 to 24 with every 8 of them assigned to a same injecting condition. Thus, the variable Con is between subjects. Every rat was measured for 6 times in 5-minute intervals. Therefore, the variable Int is within subjects. The dependent variable y is the degree of acquired tolerance (in ms) to the heat of floor.
As usual, for a quick understanding of the differences between the experimental conditions, a line plot is made for the mean ambulatory behavior in each condition. Obviously, the tolerance to the heat of floor gradually decreases after one injection of the drug. Although the decreasing trend looks the same across three groups, the group in the different condition seems to have a slower trend, namely better tolerance.
library(ggplot2)
library(reshape)
means<-with(dta,tapply(y,list(Con,Int),mean))
nums<-with(dta,tapply(y,list(Con,Int),length))
ses<-with(dta,tapply(y,list(Con,Int),sd))/sqrt(nums)
ab.dta<-data.frame(cbind(melt(means),melt(ses)[,3]))
names(ab.dta)<-c("Con","Int","y","ses")
ab.fg<-ggplot(data=ab.dta,aes(x=Int,y=y,fill=Con))+
geom_line(aes(col=Con))+geom_point(aes(col=Con))+
geom_errorbar(aes(ymin=y-ses,ymax=y+ses,col=Con),width=0,
position=position_dodge(0.05))+
labs(x="Measure Times",y="Tolerance (in ms)")
ab.fg
For a mixed design, the between-subjects factors are tested with the error term as \(MS_{error}\), whereas the the within-subjects factors are tested with the error term as the interaction effect with subject variable. The expected values of the \(MS\) terms in a mixed design with factor A (of \(p\) levels) as between-subjects factor and factor B (of \(q\) levels) as within-subjects factor are as follows.
\[\begin{align*} E(MS_A)&=\sigma^2_\epsilon+nq\theta^2_\alpha\\ E(MS_B)&=\sigma^2_\epsilon+\theta^2_{\alpha\beta\times subject}+np\theta^2_\beta\\ E(MS_{AB})&=\sigma^2\epsilon+\theta^2_{\alpha\beta\times subject}+n\theta^2_{\alpha\beta}\\ E(MS_{AB/s})&=\sigma^2_\epsilon+\theta^2_{\alpha\beta\times subject}\\ E(MS_{error})&=\sigma^2_\epsilon \end{align*}\]The degress of freedom for each \(MS\) term is shown as follows.
\[\begin{align*} df_A&=p-1\\ df_B&=q-1\\ df_{AB}&=(p-1)(q-1)\\ df_{AB/s}&=p(q-1)(n-1)\\ df_{error}&=p(n-1) \end{align*}\]The mixed design can be viewed as a compound of a within-subjects design and a between-subjects design. For the between-subjects factor, it looks like a one-way between-subjects design. Therefore, the error term for the between-subjects factor is computed as mean of the variances of response scores in the \(p\) between-subjects conditions. For the within-subjects factor, it just looks like a one-way within-subjects design. That is, the error term is the interaction effect the between within-subjects factor and the subject variable. For the interaction effect between the within- and between-subjects factors, the thing gets a little bit tricky. Since the interaction effect will always involve the within-subjects factor, it needs to consider the interaction effect with the subject variable as the error term. Therefore, whenever the effect involving a within-subjects factor, the error term is the interaction effect with the subject variable.
Let us see how to use aov() to conduct ANOVA for the mixed design. Check the \(df\)s first to make sure that the formula is correctly set for your purpose. For the between-subjects factor Con, the error term’s \(df\) is \(p(n-1)\), which is \(3\times7=21\), same as the value in the summary table. The \(df\) of the error term \(MS_{AB/subject}\) is \(p(q-1)(n-1)=3\times5\times7=105\). Again this is correct. The results reveal that all effects are significant. Note that in a within-subjects design, the subject variable is always regarded as a random-effect factor. The expected value of the \(MS\) term for its effect is not able to be compared with the interaction effect with the subject variable. Thus, the effect of subject variable cannot be tested.
exp1.aov<-aov(y~Con*Int+Error(s/Int),dta)
summary(exp1.aov)
##
## Error: s
## Df Sum Sq Mean Sq F value Pr(>F)
## Con 2 285815 142908 7.801 0.00293 **
## Residuals 21 384722 18320
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: s:Int
## Df Sum Sq Mean Sq F value Pr(>F)
## Int 5 399737 79947 29.852 < 2e-16 ***
## Con:Int 10 80820 8082 3.018 0.00216 **
## Residuals 105 281199 2678
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
According to King’s hypothesis, the conditioning of the physcial context is the source of tolerance to the drug. Thus, it is expected that the tolerance drops quickly when the environment is the same and relatively slowly when the environment is different. As shown in the line plot, it looks like the tolerance measure is always larger in the “Different” group than the other two groups. Therefore, we can test this hypothesis by contrasts. Set up two orthogonal contrasts for the variable Con. The first contrast is used to check whether the “Control” group and the “Same” group are different. The second contrast is used to check whether the “Different” group is different from the mean of the other two groups.
dta$Conc1<-ifelse(dta$Con=="Different",0,
ifelse(dta$Con=="Control",1,-1))
dta$Conc2<-ifelse(dta$Con=="Different",-1,0.5)
summary(aov(y~Conc1*Conc2*Int+Error(s/Int),dta))
##
## Error: s
## Df Sum Sq Mean Sq F value Pr(>F)
## Conc1 1 4565 4565 0.249 0.62284
## Conc2 1 281250 281250 15.352 0.00079 ***
## Residuals 21 384722 18320
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: s:Int
## Df Sum Sq Mean Sq F value Pr(>F)
## Int 5 399737 79947 29.852 < 2e-16 ***
## Conc1:Int 5 21198 4240 1.583 0.17121
## Conc2:Int 5 59622 11924 4.453 0.00102 **
## Residuals 105 281199 2678
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Apparently, the “Different” group showed a relatively stronger tolerance than the other two groups. This supports King’s hypothesis. Also, the other two groups are not different from each other. Thus, no matter the first contrast or the interaction effect between the first contrast and the factor Int is not significant.
Let’s see another nuerical example of the mixed design. In the below data set, factor a is a between-subjects factor of 2 levels. Factor b is a within-subjects factor of 4 levels. In each condition, there are 4 subjects.
dta<-data.frame(a=gl(2,16,32,labels=1:2),
b=gl(4,4,32,labels=1:4),
s=as.factor(c(rep(1:4,4),rep(5:8,4))),
y=c(3,6,3,3, 4,5,4,3, 7,8,7,6, 7,8,9,8,
1,2,2,2, 2,3,4,3, 5,6,5,6, 10,10,9,11))
According to the below figure, it seems that the subjects performance gets better along the levels of factor b. This asceding trend seems true for no matter which level of factor a. That is a significant main effect of factor b is expected. Also, at the final level of factor b, a1 is smaller than a2 contradicting to the response pattern at the other b levels. Thus, the interaction effect between a and b is expected.
means<-with(dta,tapply(y,list(a,b),mean))
nums<-with(dta,tapply(y,list(a,b),length))
ses<-with(dta,tapply(y,list(a,b),sd))/sqrt(nums)
exp2.dta<-data.frame(cbind(melt(means),melt(ses)[,3]))
names(exp2.dta)<-c("a","b","y","ses")
exp2.dta$a<-as.factor(exp2.dta$a)
exp2.fg<-ggplot(data=exp2.dta,aes(x=b,y=y,fill=a))+
geom_line(aes(col=a))+geom_point(aes(col=a))+
geom_errorbar(aes(ymin=y-ses,ymax=y+ses,col=a),width=0.1)+
scale_color_manual(values=c("red","deepskyblue3"))
exp2.fg
Test the two main effects and the interaction effect with the following codes. The main effect of factor a is not significant. However, factor b has a significant main effect and the interaction effect between factors a and b is also significant.
exp2.aov<-aov(y~a*b+Error(s/b),dta)
summary(exp2.aov)
##
## Error: s
## Df Sum Sq Mean Sq F value Pr(>F)
## a 1 3.125 3.125 2 0.207
## Residuals 6 9.375 1.562
##
## Error: s:b
## Df Sum Sq Mean Sq F value Pr(>F)
## b 3 194.50 64.83 127.89 2.52e-12 ***
## a:b 3 19.37 6.46 12.74 0.000105 ***
## Residuals 18 9.12 0.51
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For explaining a significant interaction effect, intuitively we can test the simple main effects of the two factors. However, in the current case, the simple main effect of a will always be significant at any level of b. Also, the simple main effect of b will be significant at either level of a. Therefore, it might not be a good strategy to explain this interaction effect in terms of simple main effects. The visual inspection of the above figure suggests that the fourth level of b is critical for the interaction effect, as the two response trends on factor b get crossed at here. In other words, we can say that there is no interaction effect until the third level of b. Thus, we can check this hypothesis with contrasts. The coefficients for the b levels in three orthogonal contrasts are [1 -1 0 0], [1 1 -2 0], and [1 1 1 -3]. The first contrast is used to compare the means of the first two levels of b. In the second contrast, the first two means of b are averaged and compared with the mean of the third level of b. The final contrast is used to compare the average of the first three means of b and the last men of b. If our hypothesis is true, it is expected that the interaction effects between a and these contrasts are not significant, except the last contrast. Check the tests for the interaction effects. The results support our hypothesis that only the last contrast has an interaction effect with factor a.
dta$bc1<-ifelse(dta$b==2,-1,
ifelse(dta$b==1,1,0))
dta$bc2<-ifelse(dta$b==3,-2,
ifelse(dta$b==4,0,1))
dta$bc3<-ifelse(dta$b==4,-3,1)
summary(aov(y~bc1*bc2*bc3*a+Error(s/(bc1*bc2*bc3)),dta))
## Warning in aov(y ~ bc1 * bc2 * bc3 * a + Error(s/(bc1 * bc2 * bc3)), dta):
## Error() model is singular
##
## Error: s
## Df Sum Sq Mean Sq F value Pr(>F)
## a 1 3.125 3.125 2 0.207
## Residuals 6 9.375 1.562
##
## Error: s:bc1
## Df Sum Sq Mean Sq F value Pr(>F)
## bc1 1 2.25 2.2500 7.714 0.0321 *
## bc1:a 1 1.00 1.0000 3.429 0.1135
## Residuals 6 1.75 0.2917
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: s:bc2
## Df Sum Sq Mean Sq F value Pr(>F)
## bc2 1 52.08 52.08 197.4 8.12e-06 ***
## bc2:a 1 0.00 0.00 0.0 1
## Residuals 6 1.58 0.26
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: s:bc3
## Df Sum Sq Mean Sq F value Pr(>F)
## bc3 1 140.17 140.17 145.21 1.98e-05 ***
## bc3:a 1 18.38 18.38 19.04 0.00475 **
## Residuals 6 5.79 0.97
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Singley, Hale, and Russell (2012) measured the heart rate of expert and novice skydivers at 5 time points in a skydiving session. Their heart rates at the 5 time points during skydiving are the index of their anxiety degree. The data can be download here. The column names are expGrp (1=Expert and 2=Novice), subjGrp (subject ID), time (i.e., time points of measurement), and heartRt. Please anlyze these data and answer whether the anxiety degree would be different between experts and novices. Also, is there any interaction effect between the time point of measurement and the subject group (expert vs. novice).