When there are two independent variables and all subjects take on all combinations of these conditions, this is called two-way within-subjects design. Since every subject takes on all the conditions of factor A, factor A is a within-subject factor. Likewise, factor B is a within-subjecet factor if every subjects takes on all its conditions. Since now the two independent variables are within-subjects factors, the error term includes the interaction effect between the within-cell error and the within-subjects factors. Note that the alias of within-subjects design is repeated measurement design. Thus, the two-way within-subjects design can also be called the repeated measurement design with two factors. Suppose that we have \(n\) subjects to take on all \(p\times q\) conditions of factors A and B. In the nonadditive model, the expected value of all mean squares for testing the effects in this design are listed as follows.

\[\begin{align*} E(MS_A)&=\sigma^2_{error}+q\theta^2_{\alpha/subject}+nq\theta^2_\alpha\\ E(MS_{A/s})&=\sigma^2_{error}+q\theta^2_{\alpha/subject}\\ E(MS_B)&=\sigma^2_{error}+p\theta^2_{\beta/subject}+np\theta^2_\beta\\ E(MS_{B/s})&=\sigma^2_{error}+p\theta^2_{\beta/subject}\\ E(MS_{AB})&=\sigma^2_{error}+\theta^2_{\alpha\beta/subject}+n\theta^2_{\alpha\beta}\\ E(MS_{AB/s})&=\sigma^2_{error}+\theta^2_{\alpha\beta/subject}\\ \end{align*}\]

The corresponding \(df\)s are listed as follows.

\[\begin{align*} df_A&=p-1\\ df_{A/s}&=(p-1)(n-1)\\ df_B&=q-1\\ df_{B/s}&=(q-1)(n-1)\\ df_{AB}&=(p-1)(q-1)\\ df_{AB/s}&=(p-1)(q-1)(n-1)\\ \end{align*}\]

Thus, we need to use different error terms to test these different effects. Let us see how to conduct the two-way within-subjects ANOVA in the below example.

Example 1

A company has created a new training program for their customer service staff. To test the effectiveness of the program they took a sample of 10 employees and assessed their performance in three areas: Product (knowledge of the company’s products and services), Client (their ability to relate to the customer with politeness and empathy) and Action (their ability to take action to help the customer). The data are listed as below.

dta<-data.frame(s=gl(10,1,60,labels=1:10),
                time=gl(2,30,60,labels=c("pre","post")),
                knowledge=gl(3,10,60,labels=c("product","client","action")),
                score=c(13,12,17,12,19,6,17,18,23,18,
                        12,19,19,25,27,12,18,29,30,12,
                        17,18,24,25,19,6,30,36,24,24,
                        18,6,21,18,18,6,24,22,18,24,
                        30,18,31,39,28,18,36,36,38,25,
                        34,30,32,40,27,23,38,40,32,34))

The subject variable here is called s, which is normally a series of numbers uesd to be the IDs of subjects. In aov(), any variable should be a factor. Therefore, these subject numbers are nominal variables and it does not matter which number is called for a subject, as long as every subject has his/her own number. In the above codes, we know that there are 10 subjects in total, as s is declared as a factor vector with 10 components, each of which repeats for once until all 60 positions are filled in. There are \(2\) (times) \(\times\) \(3\) (knowledge) conditions.

Describe Data

As usual, we first plot the mean scores of all subjects in these conditions. A bar plot will be suitable, as the knowledge variable is nominal. Obviously, the training seems to be working for increasing the knowledge about action and client. However, it seems not working for the knowledge about product.

library(ggplot2)
library(reshape)
means<-with(dta,tapply(score,list(time,knowledge),mean))
nums<-with(dta,tapply(score,list(time,knowledge),length))
ses<-with(dta,tapply(score,list(time,knowledge),sd))/sqrt(nums)
# Create data for plot of means
pr.dta<-cbind(melt(means),melt(ses)[,3])
names(pr.dta)<-c("time","knowledge","score","ses")
pr.fg<-ggplot(data=pr.dta,aes(x=knowledge,y=score,fill=time))+
          geom_bar(stat="identity",position=position_dodge())+
          geom_errorbar(aes(ymin=score-ses,ymax=score+ses),width=0.2,
                        position=position_dodge(0.9))+
          scale_fill_manual(values=c("pink","deepskyblue2"))+
          scale_color_manual(values=c("pink","deepskyblue2"))
pr.fg

ANOVA with Nonadditive Model

Now we can conduct a two-way within-subjects ANOVA. Here we assume that the subject variable will interact with the independent variables. That is the nonadditive model, in which the interaction between subejct variable and the effect to be tested is used as the error term in \(F\) test.

result.aov1<-aov(score~time*knowledge+Error(s/(time*knowledge)),dta)
summary(result.aov1)
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  9   1598   177.6               
## 
## Error: s:time
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## time       1  828.8   828.8   33.85 0.000254 ***
## Residuals  9  220.4    24.5                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: s:knowledge
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## knowledge  2 1365.2   682.6   26.96 3.85e-06 ***
## Residuals 18  455.8    25.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: s:time:knowledge
##                Df Sum Sq Mean Sq F value   Pr(>F)    
## time:knowledge  2  224.4  112.22   12.63 0.000373 ***
## Residuals      18  159.9    8.88                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Before reporting the ANOVA results, we first check whehter the \(df\)s are correct. As \(n=10\) and \(p=2\), the \(df\) for \(MS_{A/s}\) is 9 indeed. Also, the other \(df\)s are correct. Now we can check the testing results. The main effect of time is significant and so is the main effect of knowledge. The interaction effect between time and knowledge is significant also.

Simple Main Effect

To test the simple main effects of factors is a way to figure out the reason for a significant interaction effect. The results show that the scores of the measures of knowledeg in different respects are significantly different before or after the training. The simple main effect of time is significant for testing the knowledge about action and client, but not for testing the knowledge about product. This is consistent with our visual inspection of the bar plot.

# For knowledge
summary(aov(score~knowledge+Error(s/knowledge),subset(dta,time=="pre")))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  9    871   96.77               
## 
## Error: s:knowledge
##           Df Sum Sq Mean Sq F value Pr(>F)  
## knowledge  2  244.3  122.13   5.882 0.0108 *
## Residuals 18  373.7   20.76                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(score~knowledge+Error(s/knowledge),subset(dta,time=="post")))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  9  947.5   105.3               
## 
## Error: s:knowledge
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## knowledge  2 1345.4   672.7   50.05 4.44e-08 ***
## Residuals 18  241.9    13.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# For time
summary(aov(score~time+Error(s/time),subset(dta,knowledge=="action")))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  9    771   85.67               
## 
## Error: s:time
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## time       1  572.4   572.4   59.19 3.02e-05 ***
## Residuals  9   87.0     9.7                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(score~time+Error(s/time),subset(dta,knowledge=="client")))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  9  793.8    88.2               
## 
## Error: s:time
##           Df Sum Sq Mean Sq F value  Pr(>F)   
## time       1  460.8   460.8   21.47 0.00123 **
## Residuals  9  193.2    21.5                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(score~time+Error(s/time),subset(dta,knowledge=="product")))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  9    489   54.33               
## 
## Error: s:time
##           Df Sum Sq Mean Sq F value Pr(>F)
## time       1     20   20.00     1.8  0.213
## Residuals  9    100   11.11

ANOVA with Additive Model

What if the additive model is applied? Since in the additive model, the subject effect is assumed to be independent of other effects, the expected values of the \(MS\) terms for \(F\) tests are changed as

\[\begin{align*} E(MS_A)&=\sigma^2_\epsilon+nq\theta^2_\alpha\\ E(MS_B)&=\sigma^2_\epsilon+np\theta^2_\beta\\ E(MS_{AB})&=\sigma^2_\epsilon+n\theta^2_{\alpha\beta}\\ E(MS_{subject})&=\sigma^2+pq\theta^2_\pi\\ E(MS_{error})&=\sigma^2_\epsilon. \end{align*}\]

The corresponding \(df\) for each of these \(MS\) terms are

\[\begin{align*} df_A&=p-1\\ df_B&=q-1\\ df_{AB}&=(p-1)(q-1)\\ df_{subject}&=n-1\\ df_{error}&=(pq-1)(n-1). \end{align*}\]

Since the subect variable is assumed to be independent of other effects, in aov(), it is added in the full model of factors A and B with the default error as the error term for \(F\) tests. In the formula of aov() or lm(), the operator + is used to link the effects which are independent of each other. In the below summary table, all effects are significant, including the subject effect. That is, some subjects are better on customer service than others anyhow.

result.aov2<-aov(score~s+time*knowledge,dta)
summary(result.aov2)
##                Df Sum Sq Mean Sq F value   Pr(>F)    
## s               9 1598.1   177.6   9.558 5.79e-08 ***
## time            1  828.8   828.8  44.612 3.05e-08 ***
## knowledge       2 1365.2   682.6  36.743 3.47e-10 ***
## time:knowledge  2  224.4   112.2   6.040  0.00475 ** 
## Residuals      45  836.0    18.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

How do we test the simple main effects with the additive model? The straightforward idea is simply to call aov() with subset() to filter out the irrelevant data. However, this is not complete. Every time when aov() is conducted, the error term must be determined according to the formula. Therefore, the error term for testing the simple main effect of time at a specific level of knowledge must be the within-cell error in the data with the conditions of the other levels of knowledge. However, in the additive model, any effect is tested with the overall within-cell errors as the error term. Therefore, in order to get the correct \(F\) value, we need to combine the error term of the omnibus \(F\) test and the nominators of the \(F\) tests for the simple main effects. Since we have had the error term \(MS_{error}\) in the result.aov2, we need to have the nominators for the \(F\) tests. The below codes although meant to conduct ANOVA to test simple main effects provide the nominators we want.

simple.time1<-summary(aov(score~s+time,subset(dta,knowledge=="action")))
simple.time2<-summary(aov(score~s+time,subset(dta,knowledge=="client")))
simple.time3<-summary(aov(score~s+time,subset(dta,knowledge=="product")))
simple.knowledge1<-summary(aov(score~s+knowledge,subset(dta,time=="pre")))
simple.knowledge2<-summary(aov(score~s+knowledge,subset(dta,time=="post")))

Now we can compute the \(F\) values for the simple main effects. For instance \(F=\frac{MS_{A.B_k}}{MS_{error}}\) is used to test the simple main effect of factor A at level \(k\) of factor B. Likewise, \(F=\frac{MS_{B.A_j}}{MS_{error}}\) is testing the simple main effect of factor B at level \(j\) of factor A. In order to make the computation more automatic, I declare a function, which will implement the \(F\) test with the summary results made by aov() as well as the name of the factor to be tested as the input arugments. Please see the below codes for the contents of this function. The results of these simple main effect tests show that the simple main effect of time is not significant only when the training is for the knowledge of product. The simple main effect of knowlege is significant no matter before or after the training. This conclusion is the same as that of the nonadditive model.

# Declare a function to compute simple main effects
smef.test<-function(a,b,effect){
      # Nominator
      label1<-rownames(a[[1]])
      label1<-sapply(label1,function(x)gsub(" ","",x))
      index1<-which(label1==effect)
      df1<-a[[1]][["Df"]][index1]
      MS1<-a[[1]][["Mean Sq"]][index1]
      # Denominator
      label2<-rownames(b[[1]])
      label2<-sapply(label2,function(x)gsub(" ","",x))
      index2<-which(label2=="Residuals")
      df2<-b[[1]][["Df"]][index2]
      MS2<-b[[1]][["Mean Sq"]][index2]
      fvalue<-MS1/MS2
      resp<-c(fvalue,1-pf(fvalue,df1,df2))
      names(resp)<-c("f-value","p-value")
      return(resp)
}
# Testing simple main effect of time at action
smef.test(simple.time1,summary(result.aov2),"time")
##      f-value      p-value 
## 3.081308e+01 1.441613e-06
# Testing simple main effect of time at client
smef.test(simple.time2,summary(result.aov2),"time")
##      f-value      p-value 
## 2.480333e+01 9.822661e-06
# Testing simple main effect of time at product
smef.test(simple.time3,summary(result.aov2),"time")
##   f-value   p-value 
## 1.0765336 0.3050176
# Testing simple main effect of knowledge at pre
smef.test(simple.knowledge1,summary(result.aov2),"knowledge")
##     f-value     p-value 
## 6.574031618 0.003127795
# Testing simple main effect of knowledge at post
smef.test(simple.knowledge2,summary(result.aov2),"knowledge")
##      f-value      p-value 
## 3.620921e+01 4.248317e-10

Example 2

An experimenter manipulated two 3-level treatments for testing their effects on the participants’ performance. There are 5 participants in total tested in all conditions. The data can be seen as follow.

dta<-data.frame(s=gl(5,1,45,labels=1:5),
                a=gl(3,15,45,labels=1:3),
                b=gl(3,5,45,labels=1:3),
                y=c(37,42,33,29,24,
                    43,44,36,27,25,
                    48,47,29,38,28,
                    39,30,34,26,21,
                    35,40,31,22,27,
                    46,36,45,27,26,
                    31,21,20,18,10,
                    41,50,39,36,34,
                    64,52,53,42,49))

Describe Data

Obviously, this experiment is a 3 (A) \(\times\) 3 (B) within-subjects design. We firstly show the means of all conditions as a line plot with the errorbar as one standard error. See the below figure. It seems that factor B has no effect at the levels of factor A, except the third level.

means<-with(dta,tapply(y,list(a,b),mean))
nums<-with(dta,tapply(y,list(a,b),length))
ses<-with(dta,tapply(y,list(a,b),sd))/sqrt(nums)
exp2.dta<-data.frame(cbind(melt(means),melt(ses)[,3]))
names(exp2.dta)<-c("A","B","y","ses")
exp2.dta$B<-as.factor(exp2.dta$B)
exp2.fg<-ggplot(data=exp2.dta,aes(x=A,y=y,fill=B))+
           geom_line(aes(col=B))+geom_point(aes(col=B))+
           geom_errorbar(aes(ymin=y-ses,ymax=y+ses,col=B),width=0.2,
                         position=position_dodge(0.01))
exp2.fg

ANOVA

Normally, the subject variable in an experiment is thought to interact with the experimental treatments. Thus, nonadditive model is adopted here. The visual inspection of the above figure suggests that the interaction effect between A and B should be significant, as the effect of B seems to be stronger at the third level of A than the other two levels. The results are shonw in the below summary table. The main effect of A is marginally significant. The main effect of B and the interaction effect between A and B are both significant.

exp2.aov1<-aov(y~a*b+Error(s/(a*b)),dta)
summary(exp2.aov1)
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  4   1615   403.8               
## 
## Error: s:a
##           Df Sum Sq Mean Sq F value Pr(>F)  
## a          2  190.0   95.00   4.362 0.0524 .
## Residuals  8  174.2   21.78                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: s:b
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## b          2 1543.3   771.7   35.57 0.000104 ***
## Residuals  8  173.6    21.7                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: s:a:b
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## a:b        4 1236.7  309.17   17.23 1.19e-05 ***
## Residuals 16  287.1   17.94                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Simple Main Effects

The significant interaction effect can be explained from different perpectives. For example, we can say that the simple main effect of B is significant only at the third level of A. Therefore, we test the simple main effects of these two factors. As expected, the simple main effect of B is significant at the third level of A.

# For simple main effect of A
summary(aov(y~a+Error(s/a),subset(dta,b==1)))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  4  525.3   131.3               
## 
## Error: s:a
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## a          2  463.3  231.67    20.9 0.000666 ***
## Residuals  8   88.7   11.08                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(y~a+Error(s/a),subset(dta,b==2)))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  4    598   149.5               
## 
## Error: s:a
##           Df Sum Sq Mean Sq F value Pr(>F)   
## a          2  203.3   101.7   13.56 0.0027 **
## Residuals  8   60.0     7.5                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(y~a+Error(s/a),subset(dta,b==3)))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  4  665.3   166.3               
## 
## Error: s:a
##           Df Sum Sq Mean Sq F value  Pr(>F)   
## a          2  760.0   380.0   9.723 0.00722 **
## Residuals  8  312.7    39.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# For simple main effect of B
summary(aov(y~b+Error(s/b),subset(dta,a==1)))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  4    754   188.5               
## 
## Error: s:b
##           Df Sum Sq Mean Sq F value Pr(>F)
## b          2  63.33   31.67   2.262  0.166
## Residuals  8 112.00   14.00
summary(aov(y~b+Error(s/b),subset(dta,a==2)))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  4  597.3   149.3               
## 
## Error: s:b
##           Df Sum Sq Mean Sq F value Pr(>F)
## b          2  103.3   51.67   2.707  0.126
## Residuals  8  152.7   19.08
summary(aov(y~b+Error(s/b),subset(dta,a==3)))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  4    438   109.5               
## 
## Error: s:b
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## b          2   2613  1306.7   53.33 2.37e-05 ***
## Residuals  8    196    24.5                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Contrast Analysis

The interaction effect between A and B can be explained as that B has no effect at the first two levels of A. Therefore, the first two levels of A can be viewed as one level. The results show that the first contrast [1 -1 0] of A is not significant, whereas the second contrast [0.5 0.5 -1] of A is. This means that regardless of the B levels, A1 and A2 basically are not differnt and their mean is significantly different A3. The main effect of B is significant across of all A levels. The interaction between B and the first contrast of A is not significant. That is consistent with our inspection of the line plot. However, the interaction between B and the second contrast of A is significant. The second contrast of A refers to the comparison between the mean of the first two levels of A and the third level of A. This is significant across all B levels. Of most interesting is the interaction effect between this contrast and B, which is significant. This suggests that the overall interaction effect comes from the unusual response pattern for B at the third level of A.

dta$ac1<-ifelse(dta$a==3,0,ifelse(dta$a==2,-1,1))
dta$ac2<-ifelse(dta$a==3,-1,0.5)
summary(aov(y~(ac1+ac2)*b+Error(s/((ac1+ac2)*b)),dta))
## 
## Error: s
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  4   1615   403.8               
## 
## Error: s:ac1
##           Df Sum Sq Mean Sq F value Pr(>F)
## ac1        1   67.5   67.50   1.796  0.251
## Residuals  4  150.3   37.58               
## 
## Error: s:ac2
##           Df Sum Sq Mean Sq F value Pr(>F)  
## ac2        1 122.50  122.50   20.51 0.0106 *
## Residuals  4  23.89    5.97                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: s:b
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## b          2 1543.3   771.7   35.57 0.000104 ***
## Residuals  8  173.6    21.7                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: s:ac1:b
##           Df Sum Sq Mean Sq F value Pr(>F)
## ac1:b      2    5.0    2.50   0.111  0.897
## Residuals  8  180.7   22.58               
## 
## Error: s:ac2:b
##           Df Sum Sq Mean Sq F value Pr(>F)    
## ac2:b      2 1231.7   615.8   46.28  4e-05 ***
## Residuals  8  106.4    13.3                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Exercise:

Hu, Lin, and Chen (2016) studied the effects of screen size on rotating 3D contents using compound gestures on a mobile device. Their data can be extracted from rt_3dmobile_game.txt. The column names from left to right are Treatment (6 types of treatment), Display (1=5β€™β€˜and 2=7’’), Difficulty (1=Easy, 2=Difficult, 3=Normal), Subject, and RT (in seconds). Please examine the effects of the screen size and task difficulty on the reaction time with ANOVA.