Statistical Methods for Psychology

Small Normative Sample

In the previous sections, it is always assumed that the norm be composed of a large number of cases. However, the normative sample of a particular background might be quite small. For instance, the normative sample of a rare disease is not large. Further, the study with only one subject that we call the single case study also has a small normative sample (i.e., the subject him/herself only). What do we do to deal with a small normative sample? The answer is to substitute the t distribution for z.

For example, suppose that we have a normatie sample of 11 cases on a test of spatial memory, and that its mean \(\bar{x}=45\) and its standard deviation \(s=9\). We have a patient with a score of 31. Then \(t=\frac{x-\bar{x}}{s\sqrt{\frac{n+1}{n}}}=\frac{31-45}{9\sqrt{\frac{12}{11}}}=1.49\). We will consider a case to be extreme if its score is less than 95% of the normative sample. For this case, we want a one-tailed test on \(n-1=10\) \(df\). As \(t_{0.05(10)}=1.81\), our case would not be considered extreme.

One More Example for Confidence Interval

A therapeutic touch experiment involved 28 testing sessions in which each respondent made 10 decisions about which hand the experimenter held her over. The actual data are as follow. If respondents were performing at chance, we would expect 5 correct trials out of 10 in each session. Thus, \(H_1:\mu\neq5\) and \(H_o:\mu=5\). We can use t.test( ) with mu set up as 5 to do this test. The result shows \(p=.064\) not smaller than .05. Thus, we cannot reject \(H_o\). We have no evidence that respondents are performing at the other than chance levels.

dta<-data.frame(x=c(1,2,rep(3,8),rep(4,5),rep(5,7),6,6,7,7,7,8))
t.test(dta,mu=5)

## 
##  One Sample t-test
## 
## data:  dta
## t = -1.9318, df = 27, p-value = 0.06395
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
##  3.747978 5.037737
## sample estimates:
## mean of x 
##  4.392857

Of course, we would be interested in the confidence interval for the population mean of correct trials, which is \(\bar{x}\pm t_{.025}s_{\bar{x}}\). The 95% confidence interval for the population mean is also reported in the summary of t test. You can also

uplim<-mean(dta$x)+qt(0.975,27)*sd(dta$x)/sqrt(28)
lowerlim<-mean(dta$x)+qt(0.025,27)*sd(dta$x)/sqrt(28)
c(lowerlim,uplim)

## [1] 3.747978 5.037737

Hypothesis Tests Applied to Means - Two Matched Samples

Everit (1994) reported on family therapy as a treatment for anorexia. There were 17 girls in this experiment, and they were weighted before and after treatment. The data can be seen in Table 7.3 and we create them with the codes below.

bef<-c(83.8,83.3,86,82.5,86.7,79.6,76.9,94.2,73.4,80.5,
       81.6,82.1,77.6,83.5,89.9,86,87.3)
aft<-c(95.2,94.3,91.5,91.9,100.3,76.7,76.8,101.6,94.9,75.2,
       77.8,95.5,90.7,92.5,93.8,91.7,98)
anorexia.dta<-data.frame(before=bef,after=aft)
anorexia.dta$diff<-anorexia.dta$after-anorexia.dta$before

Before we test the hypothesis that the family therapy does improve anorexia, we should check the pattern of the weights before and after the treatment. It looks like that the one who originally has a lower weight is gains more weight after the treatment.

library(ggplot2)
ggplot(data=anorexia.dta,aes(x=before,y=diff))+
  geom_point(color="red",shape=1)+
  labs(y="Weight Gain",x="Weight Before Treatment")+
  stat_smooth(method="lm",se=F)

## `geom_smooth()` using formula 'y ~ x'

Now we would like to test the hypothesis that the family therapy is effective to improve anorexia. Although a one-tailed test seems to be our choice according to our understanding about anorexia, actually a two-tailed test is used due to the necessity to check the observed score appearing in the direction against the prediction. Thus, our \(H_1: \mu_{before}\neq\mu_{after}\) and \(H_o: \mu_{before}=\mu_{after}\). This is equivalent to saying that \(H_1: \Delta\mu\neq0\) and \(H_o:\Delta\mu=0\), where \(\Delta\mu=\mu_{after}-\mu_{before}\). Since the weights before and after treatment were measured for the same girls, the two weight samples before and after treatment are actually matched samples (or paired samples). The matched-sample t test is just a one-sample t test with the difference scores as the data and the mean = 0 when \(H_o\) is assumed to be true. Thus, it is easy to do the t test. The computed \(p\) value is far smaller than .05. Therefore, we have evidence to reject \(H_o\) and the hypothesis that the family therapy can help people improve anorexia is supported. Again, the 95% confidence interval is reported in the summary of t test also.

t.test(anorexia.dta$diff)

## 
##  One Sample t-test
## 
## data:  anorexia.dta$diff
## t = 4.1849, df = 16, p-value = 0.0007003
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   3.58470 10.94471
## sample estimates:
## mean of x 
##  7.264706

Another Example for Matched-Sample t test

In the moon illusion experiment, 10 participants were asked to measure the “moon” size with a particular scale with eyes elevated or kept horizontal. The estimated ratio of the perceived size to the real size is the dependent variable. See Table 7.4. If the cause of the moon illusion is about the level of seeing the moon, then it is hypothesized that the mean ratio would be different between different conditions. That is, \(H_1: \Delta\mu\neq0\) and \(H_o:\mu=0\). Although the mean raio in the condition with eyes elevated is smaller than that in the other condition, the difference is not large enough for us to reject \(H_o\), \(p=.67\).

moon.dta<-data.frame(c1=c(1.65,1,2.03,1.25,1.05,1.02,1.67,1.86,1.56,1.73),
                     c2=c(1.73,1.06,2.03,1.4,0.95,1.13,1.41,1.73,1.63,1.56))
moon.dta$diff<-moon.dta$c2-moon.dta$c1
t.test(moon.dta$diff)

## 
##  One Sample t-test
## 
## data:  moon.dta$diff
## t = -0.43809, df = 9, p-value = 0.6717
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.11711088  0.07911088
## sample estimates:
## mean of x 
##    -0.019

Hypothesis Tests Applied to Means-Two Independent Samples

One of the most common uses of the t test involves testing the difference between the means of two independent groups. In conducting any experiment with two independent groups, we would most likely find that the two sample means differed by some amount. The important question is whether this difference is sufficiently large to justify the conclusion that the two samples were drawn from different populations.

In this circumstance, the research hypothesis is that these two means are different from each other, \(H_1: \mu_1-\mu_2\neq0\), whereas the null hypothesis is of course that these two means are not different from each other, \(H_o: \mu_1-\mu2=0\). As usual, \(H_o\) is assumed to be true and the difference between two sample means will be transferred to the sampling distribution of \(H_o\) for testing whether it is extreme enough for us to reject \(H_o\). The question hence becomes what the sampling distribution of differences between means is.

This question is equivalent to what the two parameters of this sampling distribution (i.e., mean and variance) would be. First, let us check with the mean. Suppose we draw \(k\) samples (each size = \(n\)) from two populations. The sample means from the first populations are \(\bar{x}_{11},\bar{x}_{12},\cdots,\bar{x}_{1k}\). Likewise, the sample means from the second populations are \(\bar{x}_{21},\bar{x}_{22},\cdots,\bar{x}_{2k}\). Thus, the difference between the means from these two populations are \(\bar{x}_{11}-\bar{x}_{21},\cdots,\bar{x}_{1k}-\bar{x}_{2k}\) or \(\Delta\bar{x}_1,\cdots,\Delta\bar{x}_k\). The expected value of the mean of the sampling distribution for the mean difference then is \(E(\Delta\bar{x})=E(\bar{x}_1-\bar{x}_2)=E(\bar{x}_1)-E(\bar{x}_2)\). As \(E(\bar{x}_1)=\mu_1\) and \(E(\bar{x}_2)=\mu_2\), \(E(\Delta\bar{x})=\mu_1-\mu_2\). That is, the mean of the sampling distribution of differences between the two means is the difference between the two population means.

The variance of this sampling distribution is \(\sigma^2_{\Delta\bar{x}}=Var(\Delta\bar{x})\). Since the two groups are independent of each other, according to the variance sum law, \(Var(\Delta\bar{x})=Var(\bar{x}_1-\bar{x}_2)=Var(\bar{x}_1)+Var(\bar{x}_2)\). As the variance of sampling distribution is the variance of the population divided by sample size \(n\), \(Var(\bar{x}_1)=\frac{\sigma^2_1}{n_1}\) and \(Var(\bar{x}_2)=\frac{\sigma^2_2}{n_2}\). Therefore, \(Var(\Delta\bar{x})=\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}\).

The final point to be made about this distribution concerns its shape. An important theorem in statistics states that the sum or difference of two independent normally distributed variables is itself normally distributed. Thus, we know that the sampling distribution of differences between two means is a normal distribution with the mean as \(\mu_1-\mu_2\) and variance as \(\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}\).

When the population variances are known, we use \(z\) test with \(z=\frac{\Delta\bar{x}-\Delta\mu}{\sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}}\). When the population variances are unknown, we use \(t\) test with \(t=\frac{\Delta\bar{x}-\Delta\mu}{\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}}\). When \(H_o\) is true, \(\Delta\mu=0\).

Pooling Variances

One of the assumptions required in the use of \(t\) for two independent samples is that \(\sigma^2_1=\sigma^2_2\) regardless of the truth or falsity of \(H_o\). Since the sample variance is the unbiased estimate of the population variance, \(E(S^2_1)=\sigma^2_1\) and \(E(S^2_2)=\sigma^2_2\), and also the popultation variances are assumed to be the same, \(\sigma^2_1=\sigma^2_2=\sigma^2\), we can use the weigthed average of sample varances (or pooled variance) as the unbiased estimate of the population variance. That is, \(\sigma^2=E(s^2_p)\), where \(s^2_p=\frac{(n_1-1)s^2_1+(n_2-1)s^2_2}{n_1+n_2-2}\). Now substitute \(s^2_p\) for \(s^2_1\) and \(s^2_2\), \(t=\frac{\bar{x}_1-\bar{x}_2}{\sqrt{\frac{s^2_p}{n_1}+\frac{s^2_2}{n_2}}}\).

When the sample sizes are equal, \(n_1=n_2=n\), \(s^2_p=\frac{(n-1)s^2_1+(n-1)s^2_2}{2(n-1)}=\frac{s^2_1+s^2_2}{2}\). Then, \(t=\frac{\bar{x}_1-\bar{x}_2}{\sqrt{s^2_p(\frac{1}{n}+\frac{1}{n})}}=\frac{\bar{x}_1-\bar{x}_2}{\sqrt{\frac{s^2_1+s^2_2}{n}}}\). In this circumstance, pooling does not change the result. However, when the sample sizes are unequal, pooling can make quite a difference.

Degrees of Freedom

The degrees of freedom of each sample is the sample size - 1. Therefore, across the two samples, we will have \((n_1-1)+(n_2-1)=(n_1+n_2-2)df\).

Numeric Example

Adams, Wright, & Lohr (1996) were interested in some basic psychoanalytic theories that homophobia may be unconsciously related to the anxiety of being or becoming homosexual. These researchers administered the Index of Homophobia to 64 heterosexual males and classed them as homophobic or non-homophobic on the basis of their score. They then exposed homophobic and non-homophobic heterosexual men to videotapes of sexually explicit erotic stimuli portraying heterosexual and homosexual behavior, and recorded their level of sexual arousal. Adams et al. reasoned that if homophobia were unconsciously related to anxiety about ones own sexuality, homophobic individuals would show greater arousal to the homosexual videos than would non-homophobic individuals. Table 7.5 shows the the level of sexual arousal of subjects. Let us establish a data frame to encode these data.

H<-c(39.1,11,33.4,19.5,35.7,8.7,38,20.7,13.7,11.4,41.5,23,14.9,26.4,46.1,24.1,18.4,14.3,
     20.7,35.7,13.7,17.2,36.8,5.3,19.5,26.4,23,38,54.1,6.3,32.2,28.8,20.7,10.3,11.4)
N<-c(24,10.1,20,30.9,26.9,17,16.1,14.1,22,5.2,35.8,-0.7,-1.7,6.2,13.1,18,14.1,19,27.9,
     19,-1.7,25.9,20,14.1,-15.5,11.1,23,30.9,33.8)
HN.dta<-data.frame(y=c(H,N),g=c(rep("H",length(H)),rep("N",length(N))))

Before you conduct any statistical test, make a simple bar plot to have a quick check on the pattern of the data.

library(ggplot2)
HNfg.dta<-data.frame(y=c(mean(H),mean(N)),
                     ses=c(sd(H)/sqrt(length(H)),sd(N)/sqrt(length(N))),
                     group=c("H","N"))
ggplot(data=HNfg.dta,aes(x=group,y=y,fill=group))+
  geom_bar(stat="identity")+
  geom_errorbar(aes(ymin=y-ses,ymax=y+ses),width=0.2)

Visual inspection of this bar plot suggests that the homophobic group has a higher sexual arousal than does the non-homophobic group. With the assumption that the population variances are equal, a \(t\) test is conducted to verify the hypothesis of Adams et al. study. We can compute \(s^2_p\) first. The \(t\) score for the difference between sample means is 2.48. With \(df=35+29-2=62\), the upper critical value is 2.00 and the lower critical value is -2. Obviously, the \(t\) score falls in the rejection area. Thus, \(H_o\) is rejected and the difference on the level of sexual arousal is significant between the homophobic group and the non-homophobic group. Alternatively, you can check the \(p\) value, which is \(.02 < .05\). Again, the difference between these two groups is significant.

sp<-sqrt(((length(H)-1)*sd(H)^2+(length(N)-1)*sd(N)^2)/(length(H)+length(N)-2))
tscore<-(mean(H)-mean(N))/(sp*sqrt((1/length(H)+1/length(N))))
tscore

## [1] 2.483669

qt(0.975,length(H)+length(N)-2)

## [1] 1.998972

qt(0.025,length(H)+length(N)-2)

## [1] -1.998972

2*(1-pt(tscore,length(H)+length(N)-2))

## [1] 0.01572085

In R, you can simply run t.test( ) to conduct a \(t\) test. With the assumption that the population variances are equal, the result of the \(t\) test is shown as follow.

with(HN.dta,t.test(y~g,var.equal=T))

## 
##  Two Sample t-test
## 
## data:  y by g
## t = 2.4837, df = 62, p-value = 0.01572
## alternative hypothesis: true difference in means between group H and group N is not equal to 0
## 95 percent confidence interval:
##   1.46298 13.53012
## sample estimates:
## mean in group H mean in group N 
##        24.00000        16.50345

The confidence limits for the mean difference can be seen in the result of the \(t\) test above. Alternatively, you can compute the 95% confidence interval for the mean difference with the following codes.

ut<-qt(0.975,length(H)+length(N)-2)
lt<-qt(0.025,length(H)+length(N)-2)
ll<-sp*sqrt(1/length(H)+1/length(N))*lt+mean(H)-mean(N)
ul<-sp*sqrt(1/length(H)+1/length(N))*ut+mean(H)-mean(N)
c(ll,ul)

## [1]  1.46298 13.53012