One of the oldest methods for making post hoc comparisons is known as Fisher’s Least Significant Difference (LSD) test. When we have only a few means to compare, this is a very legitimate and usefule procedure. Fisher’s LSD is basically a set of individual t-tests and the only difference between it and linear contrasts is that Fisher’s LSD requires a significant overall F test.
The Tukey Test also called the Tukey’s HSD (Honestly Significant Difference) test or WSD (Wholly Significant Difference) test. Turkey’s HSD test uses the Sudentized \(q\) statstics for it comparisons. \(q=t\sqrt{2}\)
\(q_{r}=\frac{\overline{X}_{l}-\overline{X}_{s}}{\sqrt{\frac{MS_{error}}{n}}}\), where \(\overline{X}_{l}\) and \(\overline{X}_{s}\) respectively are the largest and smallest means.
\(t=\frac{\overline{X}_{i}-\overline{X}_{j}}{\sqrt{\frac{2MS_{error}}{n}}}\)
In R, we can use PostHocTest() in the package DescTools to implement a number of post hoc tests, including LSD and HSD tests. With the morphine tollerance data as the example, we can see how it works.
morphine.dta<-data.frame(G=gl(5,8,5*8,labels=c("MS","MM","SS","SM","McM")),
Y=c(3,5,1,8,1,1,4,9,
2,12,13,6,10,7,11,19,
14,6,12,4,19,3,9,21,
29,20,36,21,25,18,26,17,
24,26,40,32,20,33,27,30))
summary(morphine.dta)
## G Y
## MS :8 Min. : 1.00
## MM :8 1st Qu.: 6.00
## SS :8 Median :13.50
## SM :8 Mean :15.60
## McM:8 3rd Qu.:24.25
## Max. :40.00
Normally, we run an omnibus F test first and here we got a significant reject for the null hypothesis. That is, at least one group mean is significantly different the other.
morphine.aov<-aov(Y~G,morphine.dta)
summary(morphine.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## G 4 3498 874.4 27.32 2.44e-10 ***
## Residuals 35 1120 32.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Suppose you have no a prior hypothesis about which mean would be diffrent from which and you want to know from which difference the significant result of the overall F test results. A post hoc test is suitable to use for this situation. Let’s do Fisher’s LSD test first.
library(DescTools)
PostHocTest(morphine.aov,method="lsd")
##
## Posthoc multiple comparisons of means : Fisher LSD
## 95% family-wise confidence level
##
## $G
## diff lwr.ci upr.ci pval
## MM-MS 6 0.2579877 11.742012 0.0411 *
## SS-MS 7 1.2579877 12.742012 0.0183 *
## SM-MS 20 14.2579877 25.742012 3.1e-08 ***
## McM-MS 25 19.2579877 30.742012 1.9e-10 ***
## SS-MM 1 -4.7420123 6.742012 0.7258
## SM-MM 14 8.2579877 19.742012 1.9e-05 ***
## McM-MM 19 13.2579877 24.742012 8.9e-08 ***
## SM-SS 13 7.2579877 18.742012 5.4e-05 ***
## McM-SS 18 12.2579877 23.742012 2.6e-07 ***
## McM-SM 5 -0.7420123 10.742012 0.0858 .
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Fisher’s LSD test evaluates all (\(\binom{5}{2}=10\)) combinations of pairwise comparisons and only the comparisons of SS-MM and McM-SM are not significant. Now let’s do the Turkey’s HSD test.
PostHocTest(morphine.aov,method="hsd")
##
## Posthoc multiple comparisons of means : Tukey HSD
## 95% family-wise confidence level
##
## $G
## diff lwr.ci upr.ci pval
## MM-MS 6 -2.131899 14.131899 0.23404
## SS-MS 7 -1.131899 15.131899 0.11976
## SM-MS 20 11.868101 28.131899 3.0e-07 ***
## McM-MS 25 16.868101 33.131899 1.9e-09 ***
## SS-MM 1 -7.131899 9.131899 0.99649
## SM-MM 14 5.868101 22.131899 0.00017 ***
## McM-MM 19 10.868101 27.131899 8.6e-07 ***
## SM-SS 13 4.868101 21.131899 0.00049 ***
## McM-SS 18 9.868101 26.131899 2.5e-06 ***
## McM-SM 5 -3.131899 13.131899 0.40782
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
See here for more detailed introductions about different kinds of post hoc tests.
When we are concerned with the trend in effectiveness rather than multiple comparisons among specific means, we and do the trend analysis.
Mathematically, a line with the degree = 1 (i.e., \(y=ax+b\)) is linear and it is quadratic if the degree = 2 (i.e., \(y=ax^2+bx+c\)). If the degree = 3 (i.e., \(y=ax^3+bx^2+cx+d\)), the line is cubic. We can apply linear contrasts to testing the trend of the treatment effectiveness among the means.
Giancola and Corman (2007) tested the relationship between alcohol and aggression in the presence of a distracting task. The data are listed as follow.
dta<-data.frame(cond=gl(5,12,12*5,labels=1:5),
score=c(1.28,1.35,3.31,3.06,2.59,3.25,2.98,1.53,-2.68,2.64,1.26,1.06,
-1.18,0.15,1.36,2.61,0.66,1.32,0.73,-1.06,0.24,0.27,0.72,2.28,
-0.41,-1.25,-1.33,-0.47,-0.60,-1.72,-1.74,-0.77,-0.41,-1.2,-0.31,-0.74,
-0.85,0.14,-1.38,1.28,1.85,-0.59,-1.3,0.38,-0.35,2.29,-0.25,0.51,
2.01,0.4,2.34,-1.8,5,2.27,6.47,2.94,0.47,3.22,0.01,-0.66))
dim(dta)
## [1] 60 2
The omnibus F test reveals a significant effect of consuming alcohol on aggression behavior. The pattern of the means is shown in the figure below.
aa.aov<-aov(score~cond,dta)
summary(aa.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## cond 4 66.45 16.612 7.341 8.22e-05 ***
## Residuals 55 124.46 2.263
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
means<-with(dta,tapply(score,cond,mean))
ses<-with(dta,tapply(score,cond,sd))/sqrt(table(dta$cond))
library(ggplot2)
ggplot(data.frame(means=means,ses=ses,cond=1:5),aes(cond,means))+
geom_point()+geom_line()+geom_errorbar(aes(ymin=means-ses,ymax=means+ses),width=0.1)
Visual inspection suggests a second-order trend among these means. That is, the frequency of aggressive behavior gets down to the minimum when consuming a medium level of alcohol. We can check whether this observation gets statistical confirm with linear contrasts. For example, if we have three means, testing the linear trend is equivalent to testing the contrast with coefficients set as [-1 0 1] for these means. Likewise, checking the quadratic trend is equivalent to testing the contrast with the coefficients set as [1 -2 1] for these means. Notice that the coefficient vectors [-1 0 1] and [1 -2 1] are orthogonal to each other. Theoretically, there k-1 trends for k means. This webpage provides you the coefficients for up to 6 means. Let’s check for the 4 possible trends among the 5 means in the study on the relationship between alcohol and aggression. Apparently, the quadratic trend is significant. But if you put 3 orthogonal contrasts for 5 means, the \(SS_{error}\) changes and hence the F values. This is because now the sum of squares of the forth contrast is precluded from \(SS_{treatment}\). Thus, I would suggest to stick on using the correct number of orthogonal contrasts.
dta$C1<-rep(c(-2,-1,0,1,2),each=12)
dta$C2<-rep(c(2,-1,-2,-1,2),each=12)
dta$C3<-rep(c(-1,2,0,-2,1),each=12)
dta$C4<-rep(c(1,-4,6,-4,1),each=12)
aatrend.aov<-aov(score~C1+C2+C3+C4,dta)
summary(aatrend.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## C1 1 0.15 0.15 0.068 0.796
## C2 1 60.32 60.32 26.658 3.45e-06 ***
## C3 1 1.58 1.58 0.699 0.407
## C4 1 4.39 4.39 1.940 0.169
## Residuals 55 124.46 2.26
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(score~C1+C2+C3,dta))
## Df Sum Sq Mean Sq F value Pr(>F)
## C1 1 0.15 0.15 0.067 0.797
## C2 1 60.32 60.32 26.218 3.89e-06 ***
## C3 1 1.58 1.58 0.688 0.410
## Residuals 56 128.85 2.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The independent variables of experiment are referred to as factors. If an experiment has two factors, it thus is an instance of what is called a two-way factorial design.
Recall the linear model for the experiment design with one independent variable as \(X_{ij}=\mu+\mu_{j}+\epsilon_{ij}\). The total variety of scores can be decomposed to the variety accounted for by treatment and the variety from random error, namely \(SS_{Total}=SS_{treat}+SS_{error}\).
Similarly, when there are two independent variables, the observed score can be estimated by the grand mean, the mean of treatment A, the mean of treatment B, the interaction effect between A and B, and the random error. That is, \(X_{ijk}=\mu+\mu_{j}+\mu_{k}+\mu_{jk}+\epsilon_{ijk}\). Also, the sum of squares of total scores can be decomposed to the parts accounted for by treatment A, treatment B, the interaction effect between A and B, and error, namely \(SS_{Total}=SS_{A}+SS_{B}+SS_{AB}+SS_{error}\).
The total variety of scores is still \(SS_{total}=\sum\sum\sum(X_{ijk}-\mu)^2\). The variety covered by treatment A can be expressed as \(SS_{A}=\sum\sum\sum(\mu_{.j.}-\mu)^2\). Likewise, the variety covered by treatment B can be expressed as \(SS_{B}=\sum\sum\sum(\mu_{..k}-\mu)^2\). The variety covered by the error within each cell is \(SS_{error}=\sum\sum\sum(X_{ijk}-\mu_{.jk})^2\). According to the same rule for the one-way experimental design, the variety covered by the interaction effect between A and B is \(SS_{AB}=SS_{Total}-SS_{A}-SS_{B}-SS_{error}\).
The effect of treatment A is estimated by \(\alpha_{j}=\mu_{.j.}-\mu\). The effect of treatment B is estimated by \(\beta_{k}=\mu_{..k}-\mu\). The interaction effect between A and B is estimated by \(\alpha\beta_{jk}=(\mu_{.j.}-\mu)+(\mu_{..k}-\mu)-(\mu_{.jk}-\mu)\).
Therefore, \(X_{ijk}=\mu+(\mu_{.j.}-\mu)+(\mu_{..k}-\mu)-(\mu_{.j.}-\mu)-(\mu_{..k}-\mu)+(\mu_{.jk}-\mu)+(X_{ijk}-\mu_{.jk})=\mu+\alpha_{j}+\beta_{k}+\alpha\beta_{jk}+\epsilon_{ijk}\).
The degrees of freedom of each component in this linear model has the same partition of \(SS\),
\(df_{total}=df_{A}+df_{B}+df_{AB}+df_{error}\), namely
\(njk-1=(j-1)+(k-1)+(j-1)(k-1)+jk(n-1)\).
Suppose the recall task under 5 different levels of learning now is tested for two groups of participants: old and young. The data are as follow.
dta<-data.frame(age=factor(rep(c(1,2),each=50),labels=c("young","old")),
cond=factor(rep(1:5,each=10),labels=c("count","rhym","adj",
"imagery","intentation")),
y=c(8,6,4,6,7,6,5,7,9,7,
10,7,8,10,4,7,10,6,7,7,
14,11,18,14,13,22,17,16,12,11,
20,16,16,15,18,16,20,22,14,19,
21,19,17,15,22,16,22,22,18,21,
9,8,6,8,10,4,6,5,7,7,
7,9,6,6,6,11,6,3,8,7,
11,13,8,6,14,11,13,13,10,11,
12,11,16,11,9,23,12,10,19,11,
10,19,14,5,10,11,14,15,11,11))
dim(dta)
## [1] 100 3
As usual, let’s check out the pattern of the group means.
means<-with(dta,tapply(y,list(age,cond),mean))
ses<-with(dta,tapply(y,list(age,cond),sd))/sqrt(10)
# Barplot
ggplot(data.frame(means=c(means),ses=c(ses),cond=rep(unique(dta$cond),each=2),
age=rep(rep(unique(dta$age),2),5)),aes(cond,means,fill=age))+
geom_bar(stat="identity",position=position_dodge(0.9))+
geom_errorbar(aes(ymin=means-ses,ymax=means+ses),width=0.1,position=position_dodge(0.9))
In R, the same function aov() is applied to conducting a two-way ANOVA, except that now the model is written as y~A*B, which is the full model consisting of the main effect of A, the main effect of B, and the interaction effect between A and B.
lop.aov<-aov(y~age*cond,dta)
summary(lop.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## age 1 240.3 240.3 29.936 3.98e-07 ***
## cond 4 1514.9 378.7 47.191 < 2e-16 ***
## age:cond 4 190.3 47.6 5.928 0.000279 ***
## Residuals 90 722.3 8.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
One of the major benefits of factorial designs is that they allow us to examine the interaction of variables. When the interaction effect is significant, the effect of factor A depends on the level of factor B and vice versa. An easy way to see whether there is an interaction effect is to check whether there is/will be a cross between the lines of the condition means of different factors.
According to the definition of the interaction effect, when the interaction effect is significant, it is expected that one factor’s effect is different for different conditions of another factor. The effect of a factor for a condition of another factor is called simple effect.
For the above example, there are two ways to test the simple effects of Age and Condition. The first is to treat the two-way design as a one-way design.
s.young.aov<-aov(y~cond,subset(dta,age=="young"))
s.old.aov<-aov(y~cond,subset(dta,age=="old"))
summary(s.young.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## cond 4 1354 338.4 53.06 <2e-16 ***
## Residuals 45 287 6.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(s.old.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## cond 4 351.5 87.88 9.085 1.82e-05 ***
## Residuals 45 435.3 9.67
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
s.c.aov<-aov(y~age,subset(dta,cond=="count"))
s.r.aov<-aov(y~age,subset(dta,cond=="rhym"))
s.a.aov<-aov(y~age,subset(dta,cond=="adj"))
s.im.aov<-aov(y~age,subset(dta,cond=="imagery"))
s.in.aov<-aov(y~age,subset(dta,cond=="intentation"))
summary(s.c.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## age 1 1.25 1.250 0.464 0.504
## Residuals 18 48.50 2.694
summary(s.r.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## age 1 2.45 2.450 0.586 0.454
## Residuals 18 75.30 4.183
summary(s.a.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## age 1 72.2 72.2 7.848 0.0118 *
## Residuals 18 165.6 9.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(s.im.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## age 1 88.2 88.20 6.539 0.0198 *
## Residuals 18 242.8 13.49
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(s.in.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## age 1 266.4 266.45 25.23 8.84e-05 ***
## Residuals 18 190.1 10.56
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The second is to use the \(MS_{error}\) as the error term in all F tests for all simple effects. However, there is no existed function in R which can do this. Therefore, I create a function which can be used to compute the F value for testing the simple main effect only for the between-subject design.
simpleEffect<-function(aovF,aovS){
# Get the MSerror in the omnibus design
sum_statsF<-summary(aovF)[[1]]
MSr<-sum_statsF[["Mean Sq"]][4]
Dfr<-sum_statsF[["Df"]][4]
# Get the MStreat in the simplified design
sum_statsS<-summary(aovS)[[1]]
MSm<-sum_statsS[["Mean Sq"]][1]
Dfm<-sum_statsS[["Df"]][1]
f<-MSm/MSr
p<-1-pf(f,Dfm,Dfr)
return(list(Fv=f,df=c(Dfm,Dfr),p=p))
}
simpleEffect(lop.aov,s.young.aov)
## $Fv
## [1] 42.16904
##
## $df
## [1] 4 90
##
## $p
## [1] 0
simpleEffect(lop.aov,s.old.aov)
## $Fv
## [1] 10.95002
##
## $df
## [1] 4 90
##
## $p
## [1] 2.799825e-07
simpleEffect(lop.aov,s.c.aov)
## $Fv
## [1] 0.1557525
##
## $df
## [1] 1 90
##
## $p
## [1] 0.6940313
simpleEffect(lop.aov,s.r.aov)
## $Fv
## [1] 0.3052748
##
## $df
## [1] 1 90
##
## $p
## [1] 0.581964
simpleEffect(lop.aov,s.a.aov)
## $Fv
## [1] 8.996262
##
## $df
## [1] 1 90
##
## $p
## [1] 0.003498405
simpleEffect(lop.aov,s.im.aov)
## $Fv
## [1] 10.98989
##
## $df
## [1] 1 90
##
## $p
## [1] 0.001321624
simpleEffect(lop.aov,s.in.aov)
## $Fv
## [1] 33.20019
##
## $df
## [1] 1 90
##
## $p
## [1] 1.147527e-07
There are two brands of laundry powder tested for the capacity to remove dirt in the water of three different temperatures. The data are as follow.
dta<-data.frame(Brand=rep(c("Super","Best"),each=3*4),
Temp=rep(rep(c("Cold","Warm","Hot"),each=4),2),
D=c(4,5,6,5, 7,9,8,12, 10,12,11,9,
6,6,4,4, 13,15,12,12, 12,13,10,13))
The omnibus F test shows significant main effects of Brand and Temp as well as a significant interaction effect.
result.aov<-aov(D~Brand*Temp,dta)
summary(result.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Brand 1 20.17 20.17 9.811 0.00576 **
## Temp 2 200.33 100.17 48.730 5.44e-08 ***
## Brand:Temp 2 16.33 8.17 3.973 0.03722 *
## Residuals 18 37.00 2.06
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The barplot below shows for the means of all conditions across the two factors.
means<-with(dta,tapply(D,list(Brand,Temp),mean))
ses<-with(dta,tapply(D,list(Brand,Temp),sd))/2
ggplot(data.frame(means=c(means),ses=c(ses),Temp=rep(c(1,3,2),each=2),
Brand=rep(c("Best","Super"),3)),aes(Temp,means,fill=Brand))+
geom_bar(stat="identity",position=position_dodge(0.9))+
geom_errorbar(aes(ymin=means-ses,ymax=means+ses),position=position_dodge(0.9),width=0.1)+
scale_x_continuous(name="Temperature",breaks=c(1,2,3),labels=c("Cold","Warm","Hot"))+
ylab("Score")
We can test for the simplar effects of Brand and Temp. Apparently, the interaction effect between Brand and Temp comes from the fact that the Best lanudry powder outperforms the Super lanudry powder only when the water is warm.
# Simple effect of "Temp" at "Best"
sTemp.Best.aov<-aov(D~Temp,subset(dta,Brand=="Best"))
summary(sTemp.Best.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Temp 2 152 76.00 42.75 2.54e-05 ***
## Residuals 9 16 1.78
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Simple effect of "Temp" at "Super"
sTemp.Super.aov<-aov(D~Temp,subset(dta,Brand=="Super"))
summary(sTemp.Super.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Temp 2 64.67 32.33 13.86 0.00179 **
## Residuals 9 21.00 2.33
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
simpleEffect(result.aov,sTemp.Best.aov)
## $Fv
## [1] 36.97297
##
## $df
## [1] 2 18
##
## $p
## [1] 4.223371e-07
simpleEffect(result.aov,sTemp.Super.aov)
## $Fv
## [1] 15.72973
##
## $df
## [1] 2 18
##
## $p
## [1] 0.0001119975
# Simple effect of "Brand" at "Cold"
sBrand.Cold.aov<-aov(D~Brand,subset(dta,Temp=="Cold"))
summary(sBrand.Cold.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Brand 1 0 0 0 1
## Residuals 6 6 1
sBrand.Hot.aov<-aov(D~Brand,subset(dta,Temp=="Hot"))
summary(sBrand.Hot.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Brand 1 4.5 4.500 2.455 0.168
## Residuals 6 11.0 1.833
sBrand.Warm.aov<-aov(D~Brand,subset(dta,Temp=="Warm"))
summary(sBrand.Warm.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Brand 1 32 32.00 9.6 0.0212 *
## Residuals 6 20 3.33
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
simpleEffect(result.aov,sBrand.Cold.aov)
## $Fv
## [1] 4.852342e-32
##
## $df
## [1] 1 18
##
## $p
## [1] 1
simpleEffect(result.aov,sBrand.Hot.aov)
## $Fv
## [1] 2.189189
##
## $df
## [1] 1 18
##
## $p
## [1] 0.1562717
simpleEffect(result.aov,sBrand.Warm.aov)
## $Fv
## [1] 15.56757
##
## $df
## [1] 1 18
##
## $p
## [1] 0.0009480774