The Chi-Square Distribution

The Chi-Square (\(\chi^2\)) distribution is the distribution defined by \(f(\chi^2)=\frac{1}{2^{k/2}\Gamma(k/2)}\chi^{2[k/2-1]e^{-\chi^2/2}}\).

This distribution has only one parameter (\(k\)). When we move from the mathematical world to the statistical world, \(k\) will be the degrees of freedom. Thus, \(\chi^2_3\) (or \(\chi^2(3)\)) is read “chi-square with three degrees of freedom”. The below shows that the shape of \(\chi^2\) distribution is the function degrees of freedom.

library(ggplot2)
library(gridExtra)
x=seq(0.1,15,0.1)
figs<-lapply(c(1,2,4,8),function(k){
  df<-paste0("df=",k)
  temp<-ggplot(data=data.frame(x=x,y=dchisq(x,k)),aes(x,y))+
    geom_line(color="red")+ylim(0,1.5)+annotate("text",x=10,y=1,label=df)
  return(temp)
})
grid.arrange(figs[[1]],figs[[2]],
             figs[[3]],figs[[4]],
             ncol=2)

The statistical features of the chi-square distribution are that the mean = \(k\) and the variance = \(2k\).

The Chi-Square Goodness-of-Fit Test

The statistical test with the chi-square distribution is the chi-square test. There are at least two types of chi-square tests. One is called the goodness-of-fit test, which is often used to check how close the observed data are to the theoretical expectation. For example, if we have a sample distribution and we would like to check if it is a uniform distribution, a \(\chi^2\) goodness-of-fit test can be a suitable choice. See the below demonstration.

First, we create a random sample of size = 100 from a \(\chi^2\) distribution with \(df=5\) as our observed data. Then, we can count the data frequency in each of the bins which can be arbitrarily determined by ourselves. For example, we can divide the value range to 10 equal bins. With the function hist( ), we can count the data frequency in each bin, although this function is often used to make a histogram. According to the uniform distribution, in each of these bins there should be an equal number of counts. That is, 100/10=10. If the observed distribution follows a uniform distribution, it should be expected that the counts in each bin in the observed distribution should be close to that in the expected value. The visual inspectation of this table seems not to support this expectation.

set.seed(1234)
sample<-rchisq(100,5)
ObsCounts<-hist(sample,breaks=seq(0,30,3))$counts

ExpCounts<-rep(10,10)
rbind(ObsCounts,ExpCounts)
##           [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## ObsCounts   31   43   17    4    2    2    0    0    1     0
## ExpCounts   10   10   10   10   10   10   10   10   10    10

We can conduct a \(\chi^2\) test to verify our hypothesis. Note \(H_o: O-E\leq0\) and \(H_1: O-E>0\) where \(O\) means the observed counts in each cell and \(E\) means the expected counts in each cell. The \(\chi^2\) value is computed as \(\chi^2=\sum\frac{(O-E)^2}{E}\). The \(df\) for the goodness-of-fit \(\chi^2\) test is \(N-1\), where \(N\) is the cell number. Let’s compute \(\chi2\) value first. Apparently, the result rejects \(H_o\). Thus, it is inferred that this sample distribution is not a uniform distribution. Alternatively, you can use the function chisq.test( ) to conduct this goodness-of-fit test.

ct<-rbind(ObsCounts,ExpCounts)
chi2<-sum((ct[1,]-ct[2,])^2/ct[2,])
1-pchisq(chi2,9)
## [1] 0
# Alternatively
chisq.test(ObsCounts,p=rep(0.1,10),correct=F)
## 
##  Chi-squared test for given probabilities
## 
## data:  ObsCounts
## X-squared = 212.4, df = 9, p-value < 2.2e-16

Therapeutic Touch Experiment

In the Rosa, Rosa, Sarner, & Barrett (1998) study of therapeutic touch (TT). They observed 123 correct detection of therapeutic touch out of 280 trials. If the detection is purely by guessing, the ratio of accuracy should be 50%. Therefore, we can verify whether these observed data follow a uniform distribution. The estimated \(p\) value is less than .05. However, this result should be considered with caution.

Frequency Correct Incorrect Total
Observed 123 157 280
Expected 140 140 280
chisq.test(c(123,157),p=rep(0.5,2))
## 
##  Chi-squared test for given probabilities
## 
## data:  c(123, 157)
## X-squared = 4.1286, df = 1, p-value = 0.04216

Some Thoughts about Goodness-of-Fit Test

Although the null hypothesis were rejected according to the result of the \(\chi^2\) goodness-of-fit test, the direction is wrong actually.

  1. There is no real alternative hypothesis to support.

  2. \(\chi^2\) distribution is continuous but the possible values of \(\chi^2\) when we use \(\chi^2\) test are discrete.

  3. In fact, the author of this textbook suggests to always do a two-tailed test for the \(\chi^2\) test.

Assumptions of Chi-Squared Test

  1. The observed data should be independent of each other
  2. The observed count of data points in each cell should be 5 at least.

An Example with More Than Two Categories

In total, there are 75 children playing this RPS game and only one response of each kid was collected as the data in Table 6.3.

Symbol Rock Paper Scissors
Observed 30 21 24
Expected (25) (25) (25)

Whether the choice of Rock/Paper/Scissors is random? Note being random normally means uniformly distributed. This question can be addressed by a goodness-of-fit test. The result is not significant. That is, \(H_o\) is not rejected and the distribution over the symbols thrown by chilren is uniform.

chisq.test(c(30,21,24),correct=F)
## 
##  Chi-squared test for given probabilities
## 
## data:  c(30, 21, 24)
## X-squared = 1.68, df = 2, p-value = 0.4317

Two Classification Variables: Contingency Table Analysis

See Table 6.4 for a case contingency table. In this table, we count the number of cases in each of the four combinations across whether death was sentenced and whether the defendant was white people. The question is whether the death is sentenced is contingent with the defendant’s race.

Defendant’s Race Yes No Total
Nonwhite 33 251 284
White 33 508 541
Total 66 759 825

E_{ij}=. For Nonwhite-Yes, the expected frequency is \(\frac{284\times 66}{825}=22.72\). The equation of \(\chi^2\) is still \(\chi^2=\sum_j\sum_i\frac{(O_{ij}-E_{ij})^2}{E_{ij}}\). The \(df=(i-1)\times(j-1)\), which is 1 in this case.

dta<-matrix(c(33,33,251,508),2,2)
cs<-colSums(dta)
rs<-rowSums(dta)
e.dta<-matrix(0,2,2)
for(j in 1:2){
  for(i in 1:2){
    e.dta[i,j]<-rs[i]*cs[j]/sum(dta)
  }
}
chi2<-sum((dta-e.dta)^2/e.dta)
1-pchisq(chi2,1)
## [1] 0.005491987
chisq.test(dta,correct=F)
## 
##  Pearson's Chi-squared test
## 
## data:  dta
## X-squared = 7.7099, df = 1, p-value = 0.005492

Another example is offered in a study by Walsh et al. (2006) on the use of an antidepressant in the treatment of anorexia. See the data in the below table.

Treatment Success Relapse
Drug 13 36
Placebo 14 30

We can evaluate the \(\chi^2\) value made from the data in this contingency table to see if the treatment effect is significant. That is, if the drug is effective, whether or not the participant took the drug should matter on whether the participant will success or the anorexia will relapse. There is no supportive evidence for the effect of this drug on anorexia in this study.

dta<-matrix(c(13,14,36,30),2,2)
rs<-rowSums(dta)
cs<-colSums(dta)
e<-matrix(0,2,2)
for(i in 1:2){
  for(j in 1:2){
    e[i,j]<-rs[i]*cs[j]/sum(dta)
  }
}
chi2<-sum((e-dta)^2/dta)
1-pchisq(chi2,1)
## [1] 0.5747866
chisq.test(dta,correct=F)
## 
##  Pearson's Chi-squared test
## 
## data:  dta
## X-squared = 0.31458, df = 1, p-value = 0.5749

Correcting for Continuity

Many researchers suggest that for 2 x 2 tables, Yates’ correction for continuity should be employed, especially when the expected frequencies are small. The correction merely involves reducing the absolute value of each numerator by 0.5 units before squaring. However, the common availability of Fisher’s Exact Test makes Yate’s correction superfluous.

chi2.2<-sum((abs(e-dta)-0.5)^2/dta)
chi2.2
## [1] 0.1103439
1-pchisq(chi2.2,1)
## [1] 0.7397529
chisq.test(dta,correct=T)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  dta
## X-squared = 0.11029, df = 1, p-value = 0.7398

Inclusion of Nonoccurrences

Suppose we investigate the ratio of students who are in favor of having daylight saving time (DST) in the urban and rural areas. There are 17 out of 20 rural students who are in favor of having DST all year, whereas there are 11 out of 20 urban students who are so. Whether the living area would influence a student’s preference to DST is the question. However, it is not appropriate to deal with this question with a goodness-of-fit test. This is because those who are not in favor of having DST are not taken into consideration.

Num Rural Urban Total
Observed 17 11 28
Expected 14 14 28

Thus, the correct way to check the independence between living area and preference to DST is to do an independence test, as shown in the below table. The \(\chi^2\) test result is significant, suggesting the living area and the preference to DST are not independent factors.

Num Rural Urban Total
Yes 17 11 28
No 3 9 12
total 20 20 40
chisq.test(matrix(c(17,3,11,9),2,2),correct=F)
## 
##  Pearson's Chi-squared test
## 
## data:  matrix(c(17, 3, 11, 9), 2, 2)
## X-squared = 4.2857, df = 1, p-value = 0.03843

Depdendent or Repeated Measures

Although it was stated previously that the standard chi-square test of a contingency table assumes that data are independent, it is often not a problem. In Chapter 6.7, Dr. Freedenthal was interested in the effect of the intervention that was designed to increase the help-seeking behavior. She collected 70 children help-seeking behavior before and after the intervention. The author of this textbook simulated Dr. Freedenthal’s study and listed the data in Table 6.7. However, this is not an appropriate way to answer the question of Dr. Freedenthal. This is because this table cannot tell the difference between the case where help-seeking behavior and intervention totally independent and that they are totally dependent of each other. We should turn to using another way to test.

In this study, what interests us the most is the change of help-seeking behavior. That is, Yes \(\rightarrow\) No and No \(\rightarrow\) Yes. The below table lists the change on the help-seeking behavior. Thus, we can simply use the goodness-of-fit test to examine the effect of the intervention. If the intervention is of no effect, it is expected that the number of children from Yes to No is equal to the number of children from No to Yes. The testing result is significant.

Num No \(\rightarrow\) Yes Yes \(\rightarrow\) No
Observed 12 4
Expected 8 8
chisq.test(c(12,4),correct=F)
## 
##  Chi-squared test for given probabilities
## 
## data:  c(12, 4)
## X-squared = 4, df = 1, p-value = 0.0455