Statistical Methods for Psychology

Post Hoc Comparisons

Fisher’s LSD Test

One of the oldest methods for making post hoc comparisons is known as Fisher’s Least Significant Difference (LSD) test. When we have only a few means to compare, this is a very legitimate and usefule procedure. Fisher’s LSD is basically a set of individual t-tests and the only difference between it and linear contrasts is that Fisher’s LSD requires a significant overall F test.

Tukey’s Test

The Tukey Test also called the Tukey’s HSD (Honestly Significant Difference) test or WSD (Wholly Significant Difference) test. Turkey’s HSD test uses the Sudentized \(q\) statstics for it comparisons. \(q=t\sqrt{2}\)

\(q_{r}=\frac{\overline{X}_{l}-\overline{X}_{s}}{\sqrt{\frac{MS_{error}}{n}}}\), where \(\overline{X}_{l}\) and \(\overline{X}_{s}\) respectively are the largest and smallest means.

\(t=\frac{\overline{X}_{i}-\overline{X}_{j}}{\sqrt{\frac{2MS_{error}}{n}}}\)

Post Hoc Test in R

In R, we can use PostHocTest() in the package DescTools to implement a number of post hoc tests, including LSD and HSD tests. With the morphine tollerance data as the example, we can see how it works.

morphine.dta<-data.frame(G=gl(5,8,5*8,labels=c("MS","MM","SS","SM","McM")),
                         Y=c(3,5,1,8,1,1,4,9,
                             2,12,13,6,10,7,11,19,
                             14,6,12,4,19,3,9,21,
                             29,20,36,21,25,18,26,17,
                             24,26,40,32,20,33,27,30))
summary(morphine.dta)

##    G           Y        
##  MS :8   Min.   : 1.00  
##  MM :8   1st Qu.: 6.00  
##  SS :8   Median :13.50  
##  SM :8   Mean   :15.60  
##  McM:8   3rd Qu.:24.25  
##          Max.   :40.00

Normally, we run an omnibus F test first and here we got a significant reject for the null hypothesis. That is, at least one group mean is significantly different the other.

morphine.aov<-aov(Y~G,morphine.dta)
summary(morphine.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## G            4   3498   874.4   27.32 2.44e-10 ***
## Residuals   35   1120    32.0                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Suppose you have no a prior hypothesis about which mean would be diffrent from which and you want to know from which difference the significant result of the overall F test results. A post hoc test is suitable to use for this situation. Let’s do Fisher’s LSD test first.

library(DescTools)
PostHocTest(morphine.aov,method="lsd")

## 
##   Posthoc multiple comparisons of means : Fisher LSD 
##     95% family-wise confidence level
## 
## $G
##        diff     lwr.ci    upr.ci    pval    
## MM-MS     6  0.2579877 11.742012  0.0411 *  
## SS-MS     7  1.2579877 12.742012  0.0183 *  
## SM-MS    20 14.2579877 25.742012 3.1e-08 ***
## McM-MS   25 19.2579877 30.742012 1.9e-10 ***
## SS-MM     1 -4.7420123  6.742012  0.7258    
## SM-MM    14  8.2579877 19.742012 1.9e-05 ***
## McM-MM   19 13.2579877 24.742012 8.9e-08 ***
## SM-SS    13  7.2579877 18.742012 5.4e-05 ***
## McM-SS   18 12.2579877 23.742012 2.6e-07 ***
## McM-SM    5 -0.7420123 10.742012  0.0858 .  
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fisher’s LSD test evaluates all (\(\binom{5}{2}=10\)) combinations of pairwise comparisons and only the comparisons of SS-MM and McM-SM are not significant. Now let’s do the Turkey’s HSD test.

PostHocTest(morphine.aov,method="hsd")

## 
##   Posthoc multiple comparisons of means : Tukey HSD 
##     95% family-wise confidence level
## 
## $G
##        diff    lwr.ci    upr.ci    pval    
## MM-MS     6 -2.131899 14.131899 0.23404    
## SS-MS     7 -1.131899 15.131899 0.11976    
## SM-MS    20 11.868101 28.131899 3.0e-07 ***
## McM-MS   25 16.868101 33.131899 1.9e-09 ***
## SS-MM     1 -7.131899  9.131899 0.99649    
## SM-MM    14  5.868101 22.131899 0.00017 ***
## McM-MM   19 10.868101 27.131899 8.6e-07 ***
## SM-SS    13  4.868101 21.131899 0.00049 ***
## McM-SS   18  9.868101 26.131899 2.5e-06 ***
## McM-SM    5 -3.131899 13.131899 0.40782    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

See here for more detailed introductions about different kinds of post hoc tests.

Trend Analysis

When we are concerned with the trend in effectiveness rather than multiple comparisons among specific means, we and do the trend analysis.

Mathematically, a line with the degree = 1 (i.e., \(y=ax+b\)) is linear and it is quadratic if the degree = 2 (i.e., \(y=ax^2+bx+c\)). If the degree = 3 (i.e., \(y=ax^3+bx^2+cx+d\)), the line is cubic. We can apply linear contrasts to testing the trend of the treatment effectiveness among the means.

A Numerical Example

Giancola and Corman (2007) tested the relationship between alcohol and aggression in the presence of a distracting task. The data are listed as follow.

dta<-data.frame(cond=gl(5,12,12*5,labels=1:5),
                score=c(1.28,1.35,3.31,3.06,2.59,3.25,2.98,1.53,-2.68,2.64,1.26,1.06,
                        -1.18,0.15,1.36,2.61,0.66,1.32,0.73,-1.06,0.24,0.27,0.72,2.28,
                        -0.41,-1.25,-1.33,-0.47,-0.60,-1.72,-1.74,-0.77,-0.41,-1.2,-0.31,-0.74,
                        -0.85,0.14,-1.38,1.28,1.85,-0.59,-1.3,0.38,-0.35,2.29,-0.25,0.51,
                        2.01,0.4,2.34,-1.8,5,2.27,6.47,2.94,0.47,3.22,0.01,-0.66))
dim(dta)

## [1] 60  2

The omnibus F test reveals a significant effect of consuming alcohol on aggression behavior. The pattern of the means is shown in the figure below.

aa.aov<-aov(score~cond,dta)
summary(aa.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## cond         4  66.45  16.612   7.341 8.22e-05 ***
## Residuals   55 124.46   2.263                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

means<-with(dta,tapply(score,cond,mean))
ses<-with(dta,tapply(score,cond,sd))/sqrt(table(dta$cond))
library(ggplot2)
ggplot(data.frame(means=means,ses=ses,cond=1:5),aes(cond,means))+
  geom_point()+geom_line()+geom_errorbar(aes(ymin=means-ses,ymax=means+ses),width=0.1)

Visual inspection suggests a second-order trend among these means. That is, the frequency of aggressive behavior gets down to the minimum when consuming a medium level of alcohol. We can check whether this observation gets statistical confirm with linear contrasts. For example, if we have three means, testing the linear trend is equivalent to testing the contrast with coefficients set as [-1 0 1] for these means. Likewise, checking the quadratic trend is equivalent to testing the contrast with the coefficients set as [1 -2 1] for these means. Notice that the coefficient vectors [-1 0 1] and [1 -2 1] are orthogonal to each other. Theoretically, there k-1 trends for k means. This webpage provides you the coefficients for up to 6 means. Let’s check for the 4 possible trends among the 5 means in the study on the relationship between alcohol and aggression. Apparently, the quadratic trend is significant. But if you put 3 orthogonal contrasts for 5 means, the \(SS_{error}\) changes and hence the F values. This is because now the sum of squares of the forth contrast is precluded from \(SS_{treatment}\). Thus, I would suggest to stick on using the correct number of orthogonal contrasts.

dta$C1<-rep(c(-2,-1,0,1,2),each=12)
dta$C2<-rep(c(2,-1,-2,-1,2),each=12)
dta$C3<-rep(c(-1,2,0,-2,1),each=12)
dta$C4<-rep(c(1,-4,6,-4,1),each=12)
aatrend.aov<-aov(score~C1+C2+C3+C4,dta)
summary(aatrend.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## C1           1   0.15    0.15   0.068    0.796    
## C2           1  60.32   60.32  26.658 3.45e-06 ***
## C3           1   1.58    1.58   0.699    0.407    
## C4           1   4.39    4.39   1.940    0.169    
## Residuals   55 124.46    2.26                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(aov(score~C1+C2+C3,dta))

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## C1           1   0.15    0.15   0.067    0.797    
## C2           1  60.32   60.32  26.218 3.89e-06 ***
## C3           1   1.58    1.58   0.688    0.410    
## Residuals   56 128.85    2.30                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Factorial Analysis

The independent variables of experiment are referred to as factors. If an experiment has two factors, it thus is an instance of what is called a two-way factorial design.

Linear Model for Two-Way Factorial design

Recall the linear model for the experiment design with one independent variable as \(X_{ij}=\mu+\mu_{j}+\epsilon_{ij}\). The total variety of scores can be decomposed to the variety accounted for by treatment and the variety from random error, namely \(SS_{Total}=SS_{treat}+SS_{error}\).

Similarly, when there are two independent variables, the observed score can be estimated by the grand mean, the mean of treatment A, the mean of treatment B, the interaction effect between A and B, and the random error. That is, \(X_{ijk}=\mu+\mu_{j}+\mu_{k}+\mu_{jk}+\epsilon_{ijk}\). Also, the sum of squares of total scores can be decomposed to the parts accounted for by treatment A, treatment B, the interaction effect between A and B, and error, namely \(SS_{Total}=SS_{A}+SS_{B}+SS_{AB}+SS_{error}\).

The total variety of scores is still \(SS_{total}=\sum\sum\sum(X_{ijk}-\mu)^2\). The variety covered by treatment A can be expressed as \(SS_{A}=\sum\sum\sum(\mu_{.j.}-\mu)^2\). Likewise, the variety covered by treatment B can be expressed as \(SS_{B}=\sum\sum\sum(\mu_{..k}-\mu)^2\). The variety covered by the error within each cell is \(SS_{error}=\sum\sum\sum(X_{ijk}-\mu_{.jk})^2\). According to the same rule for the one-way experimental design, the variety covered by the interaction effect between A and B is \(SS_{AB}=SS_{Total}-SS_{A}-SS_{B}-SS_{error}\).

The effect of treatment A is estimated by \(\alpha_{j}=\mu_{.j.}-\mu\). The effect of treatment B is estimated by \(\beta_{k}=\mu_{..k}-\mu\). The interaction effect between A and B is estimated by \(\alpha\beta_{jk}=(\mu_{.j.}-\mu)+(\mu_{..k}-\mu)-(\mu_{.jk}-\mu)\).

Therefore, \(X_{ijk}=\mu+(\mu_{.j.}-\mu)+(\mu_{..k}-\mu)-(\mu_{.j.}-\mu)-(\mu_{..k}-\mu)+(\mu_{.jk}-\mu)+(X_{ijk}-\mu_{.jk})=\mu+\alpha_{j}+\beta_{k}+\alpha\beta_{jk}+\epsilon_{ijk}\).

The degrees of freedom of each component in this linear model has the same partition of \(SS\),

\(df_{total}=df_{A}+df_{B}+df_{AB}+df_{error}\), namely

\(njk-1=(j-1)+(k-1)+(j-1)(k-1)+jk(n-1)\).

A Numeric Example

Suppose the recall task under 5 different levels of learning now is tested for two groups of participants: old and young. The data are as follow.

dta<-data.frame(age=factor(rep(c(1,2),each=50),labels=c("young","old")),
                cond=factor(rep(1:5,each=10),labels=c("count","rhym","adj",
                                                      "imagery","intentation")),
                y=c(8,6,4,6,7,6,5,7,9,7,
                    10,7,8,10,4,7,10,6,7,7,
                    14,11,18,14,13,22,17,16,12,11,
                    20,16,16,15,18,16,20,22,14,19,
                    21,19,17,15,22,16,22,22,18,21,
                    9,8,6,8,10,4,6,5,7,7,
                    7,9,6,6,6,11,6,3,8,7,
                    11,13,8,6,14,11,13,13,10,11,
                    12,11,16,11,9,23,12,10,19,11,
                    10,19,14,5,10,11,14,15,11,11))
dim(dta)

## [1] 100   3

As usual, let’s check out the pattern of the group means.

means<-with(dta,tapply(y,list(age,cond),mean))
ses<-with(dta,tapply(y,list(age,cond),sd))/sqrt(10)
# Barplot
ggplot(data.frame(means=c(means),ses=c(ses),cond=rep(unique(dta$cond),each=2),
                  age=rep(rep(unique(dta$age),2),5)),aes(cond,means,fill=age))+
  geom_bar(stat="identity",position=position_dodge(0.9))+
  geom_errorbar(aes(ymin=means-ses,ymax=means+ses),width=0.1,position=position_dodge(0.9))

In R, the same function aov() is applied to conducting a two-way ANOVA, except that now the model is written as y~A*B, which is the full model consisting of the main effect of A, the main effect of B, and the interaction effect between A and B.

lop.aov<-aov(y~age*cond,dta)
summary(lop.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## age          1  240.3   240.3  29.936 3.98e-07 ***
## cond         4 1514.9   378.7  47.191  < 2e-16 ***
## age:cond     4  190.3    47.6   5.928 0.000279 ***
## Residuals   90  722.3     8.0                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interactions

One of the major benefits of factorial designs is that they allow us to examine the interaction of variables. When the interaction effect is significant, the effect of factor A depends on the level of factor B and vice versa. An easy way to see whether there is an interaction effect is to check whether there is/will be a cross between the lines of the condition means of different factors.

Simple Effects

According to the definition of the interaction effect, when the interaction effect is significant, it is expected that one factor’s effect is different for different conditions of another factor. The effect of a factor for a condition of another factor is called simple effect.

For the above example, there are two ways to test the simple effects of Age and Condition. The first is to treat the two-way design as a one-way design.

s.young.aov<-aov(y~cond,subset(dta,age=="young"))
s.old.aov<-aov(y~cond,subset(dta,age=="old"))
summary(s.young.aov)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## cond         4   1354   338.4   53.06 <2e-16 ***
## Residuals   45    287     6.4                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(s.old.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## cond         4  351.5   87.88   9.085 1.82e-05 ***
## Residuals   45  435.3    9.67                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

s.c.aov<-aov(y~age,subset(dta,cond=="count"))
s.r.aov<-aov(y~age,subset(dta,cond=="rhym"))
s.a.aov<-aov(y~age,subset(dta,cond=="adj"))
s.im.aov<-aov(y~age,subset(dta,cond=="imagery"))
s.in.aov<-aov(y~age,subset(dta,cond=="intentation"))
summary(s.c.aov)

##             Df Sum Sq Mean Sq F value Pr(>F)
## age          1   1.25   1.250   0.464  0.504
## Residuals   18  48.50   2.694

summary(s.r.aov)

##             Df Sum Sq Mean Sq F value Pr(>F)
## age          1   2.45   2.450   0.586  0.454
## Residuals   18  75.30   4.183

summary(s.a.aov)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## age          1   72.2    72.2   7.848 0.0118 *
## Residuals   18  165.6     9.2                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(s.im.aov)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## age          1   88.2   88.20   6.539 0.0198 *
## Residuals   18  242.8   13.49                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(s.in.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## age          1  266.4  266.45   25.23 8.84e-05 ***
## Residuals   18  190.1   10.56                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The second is to use the \(MS_{error}\) as the error term in all F tests for all simple effects. However, there is no existed function in R which can do this. Therefore, I create a function which can be used to compute the F value for testing the simple main effect only for the between-subject design.

simpleEffect<-function(aovF,aovS){
     # Get the MSerror in the omnibus design
     sum_statsF<-summary(aovF)[[1]]
     MSr<-sum_statsF[["Mean Sq"]][4]
     Dfr<-sum_statsF[["Df"]][4]
     # Get the MStreat in the simplified design
     sum_statsS<-summary(aovS)[[1]]
     MSm<-sum_statsS[["Mean Sq"]][1]
     Dfm<-sum_statsS[["Df"]][1]
     f<-MSm/MSr
     p<-1-pf(f,Dfm,Dfr)
     return(list(Fv=f,df=c(Dfm,Dfr),p=p))
}
simpleEffect(lop.aov,s.young.aov)

## $Fv
## [1] 42.16904
## 
## $df
## [1]  4 90
## 
## $p
## [1] 0

simpleEffect(lop.aov,s.old.aov)

## $Fv
## [1] 10.95002
## 
## $df
## [1]  4 90
## 
## $p
## [1] 2.799825e-07

simpleEffect(lop.aov,s.c.aov)

## $Fv
## [1] 0.1557525
## 
## $df
## [1]  1 90
## 
## $p
## [1] 0.6940313

simpleEffect(lop.aov,s.r.aov)

## $Fv
## [1] 0.3052748
## 
## $df
## [1]  1 90
## 
## $p
## [1] 0.581964

simpleEffect(lop.aov,s.a.aov)

## $Fv
## [1] 8.996262
## 
## $df
## [1]  1 90
## 
## $p
## [1] 0.003498405

simpleEffect(lop.aov,s.im.aov)

## $Fv
## [1] 10.98989
## 
## $df
## [1]  1 90
## 
## $p
## [1] 0.001321624

simpleEffect(lop.aov,s.in.aov)

## $Fv
## [1] 33.20019
## 
## $df
## [1]  1 90
## 
## $p
## [1] 1.147527e-07

Another Numeric Example

There are two brands of laundry powder tested for the capacity to remove dirt in the water of three different temperatures. The data are as follow.

dta<-data.frame(Brand=rep(c("Super","Best"),each=3*4),
                Temp=rep(rep(c("Cold","Warm","Hot"),each=4),2),
                D=c(4,5,6,5, 7,9,8,12, 10,12,11,9,
                    6,6,4,4, 13,15,12,12, 12,13,10,13))

The omnibus F test shows significant main effects of Brand and Temp as well as a significant interaction effect.

result.aov<-aov(D~Brand*Temp,dta)
summary(result.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Brand        1  20.17   20.17   9.811  0.00576 ** 
## Temp         2 200.33  100.17  48.730 5.44e-08 ***
## Brand:Temp   2  16.33    8.17   3.973  0.03722 *  
## Residuals   18  37.00    2.06                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The barplot below shows for the means of all conditions across the two factors.

means<-with(dta,tapply(D,list(Brand,Temp),mean))
ses<-with(dta,tapply(D,list(Brand,Temp),sd))/2
ggplot(data.frame(means=c(means),ses=c(ses),Temp=rep(c(1,3,2),each=2),
                  Brand=rep(c("Best","Super"),3)),aes(Temp,means,fill=Brand))+
  geom_bar(stat="identity",position=position_dodge(0.9))+
  geom_errorbar(aes(ymin=means-ses,ymax=means+ses),position=position_dodge(0.9),width=0.1)+
  scale_x_continuous(name="Temperature",breaks=c(1,2,3),labels=c("Cold","Warm","Hot"))+
  ylab("Score")

We can test for the simplar effects of Brand and Temp. Apparently, the interaction effect between Brand and Temp comes from the fact that the Best lanudry powder outperforms the Super lanudry powder only when the water is warm.

# Simple effect of "Temp" at "Best"
sTemp.Best.aov<-aov(D~Temp,subset(dta,Brand=="Best"))
summary(sTemp.Best.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Temp         2    152   76.00   42.75 2.54e-05 ***
## Residuals    9     16    1.78                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Simple effect of "Temp" at "Super"
sTemp.Super.aov<-aov(D~Temp,subset(dta,Brand=="Super"))
summary(sTemp.Super.aov)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Temp         2  64.67   32.33   13.86 0.00179 **
## Residuals    9  21.00    2.33                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

simpleEffect(result.aov,sTemp.Best.aov)

## $Fv
## [1] 36.97297
## 
## $df
## [1]  2 18
## 
## $p
## [1] 4.223371e-07

simpleEffect(result.aov,sTemp.Super.aov)

## $Fv
## [1] 15.72973
## 
## $df
## [1]  2 18
## 
## $p
## [1] 0.0001119975

# Simple effect of "Brand" at "Cold"
sBrand.Cold.aov<-aov(D~Brand,subset(dta,Temp=="Cold"))
summary(sBrand.Cold.aov)

##             Df Sum Sq Mean Sq F value Pr(>F)
## Brand        1      0       0       0      1
## Residuals    6      6       1

sBrand.Hot.aov<-aov(D~Brand,subset(dta,Temp=="Hot"))
summary(sBrand.Hot.aov)

##             Df Sum Sq Mean Sq F value Pr(>F)
## Brand        1    4.5   4.500   2.455  0.168
## Residuals    6   11.0   1.833

sBrand.Warm.aov<-aov(D~Brand,subset(dta,Temp=="Warm"))
summary(sBrand.Warm.aov)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## Brand        1     32   32.00     9.6 0.0212 *
## Residuals    6     20    3.33                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

simpleEffect(result.aov,sBrand.Cold.aov)

## $Fv
## [1] 4.852342e-32
## 
## $df
## [1]  1 18
## 
## $p
## [1] 1

simpleEffect(result.aov,sBrand.Hot.aov)

## $Fv
## [1] 2.189189
## 
## $df
## [1]  1 18
## 
## $p
## [1] 0.1562717

simpleEffect(result.aov,sBrand.Warm.aov)

## $Fv
## [1] 15.56757
## 
## $df
## [1]  1 18
## 
## $p
## [1] 0.0009480774