Statistical Methods for Psychology

Unequal Sample Size in one-way Between-Subjects ANOVA

In an experiment, when at least one condition has a different subject number from the others, this experiment is known as unequal sample sized. Normally, the experimenter will sample the same amount of subjects for each of the testing conditions. However, for some studies with special concerning about subjects (e.g., patients with a particular disease) or for any kind of reason to lose some data (e.g., careless experimenter, crashed down computer, etc) and there is no way to get more data, it is possible to have unequal sample sizes in different testing conditions.

Don’t worry. It is easy to handle this problem in the one-way ANOVA with between-subject design. The below data were collected for comparing the effects on anorexia of different treatments. There were three conditions: control, cognitive-behavior therapy, and family therapy. See the data.

anorexia.dta<-data.frame(cnd=as.factor(c(rep(1,26),rep(2,29),rep(3,17))),
                         gain=c(-0.5,-9.3,-5.4,12.3,-2,-10.2,-12.2,11.6,
                                -7.1,6.2,-0.2,-9.2,8.3,3.3,11.3,0,-1,-10.6,
                                -4.6,-6.7,2.8,0.3,1.8,3.7,15.9,-10.2,
                                1.7,0.7,-0.1,-0.7,-3.5,14.9,3.5,17.1,-7.6,
                                1.6,11.7,6.1,1.1,-4,20.9,-9.1,2.1,-1.4,1.4,
                                -0.3,-3.7,-0.8,2.4,12.6,1.9,3.9,0.1,15.4,-0.7,
                                11.4,11,5.5,9.4,13.6,-2.9,-0.1,7.4,21.5,-5.3,
                                -3.8,13.4,13.1,9,3.9,5.7,10.7))
summary(anorexia.dta)

##  cnd         gain        
##  1:26   Min.   :-12.200  
##  2:29   1st Qu.: -2.225  
##  3:17   Median :  1.650  
##         Mean   :  2.764  
##         3rd Qu.:  9.100  
##         Max.   : 21.500

Let’s plot the group means first. In our textbook, the box plot was presented instead. You can try to use R to make a box plot with this data set.

means<-with(anorexia.dta,tapply(gain,cnd,mean))
ns<-with(anorexia.dta,table(cnd))
ses<-with(anorexia.dta,tapply(gain,cnd,sd))/sqrt(ns)
fg.anorexia.dta<-data.frame(means=means,ses=ses,cond=as.factor(1:3))
library(ggplot2)
ggplot(fg.anorexia.dta,aes(x=cond,y=means))+
  geom_bar(stat="identity",alpha=0.7,fill="deepskyblue3",position=position_dodge(0.9))+
  geom_errorbar(aes(ymin=means-ses,ymax=means+ses),width=0.2)+
  scale_x_discrete("Group",labels=c("Control","Cognitive-Behavior","Family"))+
  ylab("Gained Body Weight")

Visual inspection of this figure suggests that the recovered weights seem to be different among these three groups. Since the participants were randomly assigned to each of these three groups, this is a between-subjects experimental design. With the type of therapy as the independent variable and the gained weight as the dependent variable, a one-way between-subjects ANOVA is conducted.

anorexia.aov<-aov(gain~cnd,anorexia.dta)
summary(anorexia.aov)

##             Df Sum Sq Mean Sq F value Pr(>F)   
## cnd          2    615  307.32   5.422 0.0065 **
## Residuals   69   3911   56.68                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since now we have no equal sample size in different conditions, the degrees of freedom within cell is no longer \(k(n-1)\), where k is the number of conditions and n the subject number. When conducting the one-way ANOVA for unequal-sample-sized data, the degrees of freedom of \(MS_{Error}\) is \(N-k\), where \(N\) is the total subject number and k still is the condition numbers.

Let’s see another example. Suppose you collect some students grades on economics, medicine, and history and you want to know if the mean grades are different or not. Suppose again each of these students provides one grade of his/her courses only. Apparently, there are three conditions here and one independent variable (i.e., course topic). The one-way between-subjects ANOVA is suitable for this situation. The below R codes are made for (1) creating the data set, (2) making a barplot for showing the group means, and (3) conducting the one-way ANOVA.

dta<-data.frame(score=c(42,53,49,53,43,44,45,52,54,
                        69,54,58,64,64,55,56,
                        35,40,53,42,50,39,55,39,40),
                topic=as.factor(c(rep(1,9),rep(2,7),rep(3,9))))
# For topic, 1:"Economics", 2:"Medicine", and 3:"History"
means<-with(dta,tapply(score,topic,mean))
ses<-with(dta,tapply(score,topic,sd))/sqrt(table(dta$topic))
ggplot(data.frame(means=means,ses=ses,topic=as.factor(1:3)),aes(x=topic,y=means))+
  geom_bar(stat="identity",alpha=0.7,fill="tomato",position=position_dodge(0.9))+
  geom_errorbar(aes(ymin=means-ses,ymax=means+ses),width=0.2)+
  scale_x_discrete("Topic",labels=c("Economics","Medicine","History"))+
  ylab("Grade")

scores.aov<-aov(score~topic,dta)
summary(scores.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## topic        2   1086   542.9    15.2 7.16e-05 ***
## Residuals   22    786    35.7                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With \(H_o:\) all topic grades are not different and \(H_1:\) at least two topic grades are different, a one-way between-subjects ANOVA is conducted. Note the degrees of freedom for the \(SSe\) term is \(22=25-3\), namely \(N-k\). The degrees of freedom of the independent variable (i.e., topic) is \(2=3-1\), namely \(k-1\). All these degrees of freedom are correct. Now we can check the results in the summary table. Obviously, the ominbus F test is significant. This means that \(H_o\) is rejected and at least two topic scores are significant different.

Transformations

Sometimes, the original data do not conform the assumption(s) of ANOVA. One common situation is that the data are not normally distributed. Although it is not encouraged, a necessary transformation for the data to become normally distributed is acceptable. The below data collected by Conti and Musty (1984) show the activity units for each animal over the 10-minute postinjection period in five conditions which are defined by the amonut of drug injected into the animals: 0 \(\mu\)g, 0.1 \(\mu\)g, 0.5 \(\mu\)g, 1 \(\mu\)g, and 2 \(\mu\)g. These data can be found in Table 11.6 in Chapter 11.

# Create the data set
dta<-data.frame(G=as.factor(c(rep(1:2,each=10),rep(3,9),rep(4,8),rep(5,10))),
                Y=c(130,94,225,105,92,190,32,64,69,93,
                    93,444,403,192,67,170,77,353,365,422,
                    510,416,154,636,396,451,376,192,384,
                    229,475,348,276,167,151,107,235,
                    144,111,217,200,84,99,44,84,284,293))
dim(dta)

## [1] 47  2

The mean activity magnitudes of subjects in all conditions are plotted. Clearly, the moderate level of the drug makes animal subjects most active, while the animal subjects become less active with more or less than the moderate level of the drug.

# Make the plot for the activity of animals in different conditions
means<-with(dta,tapply(Y,G,mean))
ses<-with(dta,tapply(Y,G,sd))/sqrt(table(dta$G))
ggplot(data.frame(means=means,ses=ses,G=1:5),aes(x=G,y=means))+
  geom_point(shape=1)+geom_line()+
  geom_errorbar(aes(ymin=means-ses,ymax=means+ses),width=0.1)+
  scale_x_continuous(name="Level",labels=c("0 mg","0.1 mg","0.5 mg","1 mg","2 mg"))+
  ylab("Activity")

One thing worth checking is the distribution of the collected data in each condition. The below plot is called violin plot. The violin plots are similar to box plots, except that they also show the kernel probability density of the data at different values. I also add the means and the upper and lower bound from the mean with one standard deviation. In order to add the means and the bonds, a simple function called data_summary is declared first.

# Declare a function to compute mean and the two bonds around the mean
data_summary<-function(x){
  return(c(y=mean(x),ymin=mean(x)-sd(x),ymax=mean(x)+sd(x)))
}
# Violin plot
ggplot(dta,aes(G,Y))+
  geom_violin(trim=F)+
  stat_summary(fun.data=data_summary,size=0.3,color="red")+
  scale_x_discrete(name="Level",labels=c("0 mg","0.1 mg","0.5 mg","1 mg","2 mg"))+
  ylab("Activity")

As shown in this plot, each vertical distribution is the probability density of the data in one condition. Many of them are not normal, that violates the normality assumption in ANOVA. In most circumstances, researchers still do the ANOVA given this violation to the normality assumption. This is because F test is a robust test, providing a high tolerance to this situation. However, we can still make some reasonable transformation to our data to make the data distribution approach normal.

There are many ways to do transformation. For instance, we can transform the raw data by their inverses or their square roots. One useful hint is when the standard deviation is proportional to the mean, the logarithmic transformation is suitable. Thus, we first check the correlation between the means and the sd’s in these conditions. As shown in the following scatter plot, the means and sd’s are correlated with each other. Although the \(p\) value is marginally significant, the correlation is fair enough.

sds<-as.vector(ses*sqrt(table(dta$G)))
ggplot(data.frame(means=means,sds=sds),aes(means,sds))+
  geom_point()

cor.test(means,sds)

## 
##  Pearson's product-moment correlation
## 
## data:  means and sds
## t = 3.1508, df = 3, p-value = 0.05123
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.02619871  0.99178812
## sample estimates:
##       cor 
## 0.8763233

Therefore, we can transform the data \(Y\) as \(log_{10}(Y)\). The voilin plot for the transferred data shows much improvement on the distributions. The correlation between the means and sd’s of these transferred data is no longer significant.

dta$y<-log10(dta$Y)
ggplot(dta,aes(G,y))+
  geom_violin(trim=F)+stat_summary(fun.data=data_summary,size=0.3,color="red")+
  scale_x_discrete(name="Level",labels=c("0 mg","0.1 mg","0.5 mg","1 mg","2 mg"))+
  ylab("Log10(Activity)")

Tmeans<-with(dta,tapply(y,G,mean))
Tsds<-with(dta,tapply(y,G,sd))
Tses<-Tsds/sqrt(table(dta$G))
cor.test(Tmeans,Tsds)

## 
##  Pearson's product-moment correlation
## 
## data:  Tmeans and Tsds
## t = -0.59424, df = 3, p-value = 0.5942
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9381745  0.7815009
## sample estimates:
##        cor 
## -0.3245182

Now we can conduct the ANOVA with the original data as well as the transferred data as the independent. The results are the same that the level of drug affects the activity of anmials. Also, we can use lm( ) to check the difference on the mean activity between the first condition and the other conditions. Obviously, they are all significant. Since the first condition is the control group which didn’t take the drug at all, generally speaking, this drug has an effect to encrease the activity of animals.

result1.aov<-aov(Y~G,dta)
result2.aov<-aov(y~G,dta)
summary(result1.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## G            4 443028  110757   7.956 7.21e-05 ***
## Residuals   42 584661   13921                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(result2.aov)

##             Df Sum Sq Mean Sq F value  Pr(>F)    
## G            4  1.850  0.4625   7.119 0.00018 ***
## Residuals   42  2.729  0.0650                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(lm(Y~G,dta))

## 
## Call:
## lm(formula = Y ~ G, data = dta)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -236.6  -72.0  -13.5   87.5  245.4 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   109.40      37.31   2.932  0.00543 ** 
## G2            149.20      52.76   2.828  0.00715 ** 
## G3            281.16      54.21   5.186  5.8e-06 ***
## G4            139.10      55.97   2.485  0.01700 *  
## G5             46.60      52.76   0.883  0.38217    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 118 on 42 degrees of freedom
## Multiple R-squared:  0.4311, Adjusted R-squared:  0.3769 
## F-statistic: 7.956 on 4 and 42 DF,  p-value: 7.214e-05

Effect Size

When the null hypothesis testing is significant, the next step is to ask how strong the effect of experimental treatment is. This is the question of effect size. There are a number of indices for the effect size of an F test. The first is called \(\eta^2\), which is simply the ratio of \(SS_{Treat}\) to \(SS_{total}\).

A numeric example

Two researchers tested 12*5 subjects’ aggressive behavior under five conditions of different extents of distraction during task. Create the data as follow.

dta<-data.frame(cond=gl(5,12,12*5,labels=1:5),
                score=c(1.28,1.35,3.31,3.06,2.59,3.25,2.98,1.53,-2.68,2.64,1.26,1.06,
                        -1.18,0.15,1.36,2.61,0.66,1.32,0.73,-1.06,0.24,0.27,0.72,2.28,
                        -0.41,-1.25,-1.33,-0.47,-0.60,-1.72,-1.74,-0.77,-0.41,-1.2,-0.31,-0.74,
                        -0.45,0.54,-0.98,1.68,2.25,-0.19,-0.9,0.78,0.05,2.69,0.15,0.91,
                        2.01,0.4,2.34,-1.8,5,2.27,6.47,2.94,0.47,3.22,0.01,-0.66))
dim(dta)

## [1] 60  2

result.aov<-aov(score~cond,dta)
summary(result.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## cond         4  62.46  15.615   6.901 0.000142 ***
## Residuals   55 124.46   2.263                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this case, the \(\eta^2\) is 62.46/(62.46+124.46)=0.334. Alternatively, you can use lm() to see the \(R^2\).

summary(lm(score~cond,dta))

## 
## Call:
## lm(formula = score ~ cond, data = dta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4825 -0.5904  0.0500  0.7106  4.5808 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.80250    0.43425   4.151 0.000116 ***
## cond2       -1.12750    0.61412  -1.836 0.071772 .  
## cond3       -2.71500    0.61412  -4.421 4.67e-05 ***
## cond4       -1.25833    0.61412  -2.049 0.045245 *  
## cond5        0.08667    0.61412   0.141 0.888288    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.504 on 55 degrees of freedom
## Multiple R-squared:  0.3342, Adjusted R-squared:  0.2857 
## F-statistic: 6.901 on 4 and 55 DF,  p-value: 0.0001415

The second is omega-squared (\(\omega^2\)).

\(\omega^2=\frac{SS_{treat}-(k-1)MS_{error}}{SS_{total}+MS_{error}}\)

You can calculate the \(\omega^2\) with your R codes or the user-defined function below.

omega2<-(62.46-4*2.263)/(62.46+124.46+2.263)
# Or you can use the function below to compute omega squared
omega_sq <- function(aovm){
  sum_stats <- summary(aovm)[[1]]
  SSm <- sum_stats[["Sum Sq"]][1]
  SSr <- sum_stats[["Sum Sq"]][2]
  DFm <- sum_stats[["Df"]][1]
  MSr <- sum_stats[["Mean Sq"]][2]
  W2 <- (SSm-DFm*MSr)/(SSm+SSr+MSr)
  return(W2)
}
omega_sq(result.aov)

## [1] 0.2823153

Normally, \(\omega^2\) is a stricter index than \(\eta^2\). For the same data, \(\omega^2\) is smaller than \(\eta^2\).

The third effect size is \(d\). For the \(j\)th group, \(d_{j}=\frac{\mu_{j}-\mu}{\sigma}\). Therefore, for the overall design, \(d\) is the square root of the sum of squared \(d^2\) devided by degrees of freedom.

\(d=\sqrt{(\frac{1}{k-1})\sum(\frac{\mu_{j}-\mu}{\sigma})^2}\)

Thus, the effect size \(d\) of the overall F test in the previous case can be computed as follow.

means<-with(dta,tapply(score,cond,mean))
d<-sqrt(sum((means-mean(dta$score))^2/(4*2.26)))
d

## [1] 0.7587995

Power

See this webpage for the R functions to compute power. Before you start to use these functions to compute power, do remember to load the package pwr first.

library(pwr)
f<-sqrt(sum((means-mean(dta$score))^2/(5*2.26)))
f

## [1] 0.6786909

pwr.anova.test(k=5,n=12,f)

## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 5
##               n = 12
##               f = 0.6786909
##       sig.level = 0.05
##           power = 0.990223
## 
## NOTE: n is number in each group