Bayesian Data Analysis

Example 2: Use Beta Distribution as Prior Distribution

In the previous example, we estimate the probability of Head for a coin with a uniform distribution to represent the prior belief. In fact, Beta distribution is more compatible with JAGS. In this example, we demonstrate how to use a Beta prior to do parameter estimation with the binomial probability mass function as the likelihood function. Suppose we tossed a coin for ten times and observed 3 Heads. As usual, we firstly create a list for storing the data. Then we compose a JAGS model in R and save it as a text file. Note Beta distribution has two parameters a and b. When a = b = 1, the Beta distribution becomes the uniform distribution. You can check for the relation between the parameters and the shape of the Beta distribution. By the way, when a = b = an extremely large number, say 500, the Beta distribution has a peak at 0.5. Thus, it is often used to represent that you strongly believe the to-be-estimated parameter is 0.5 in advance.

# Create data list
datalist<-list(N=10,k=3)
# Create JAGS model
modelString<-"
model{
   # Likelihood function
   k~dbin(theta,N)
   # Prior
   theta~dbeta(1,1)
}"
# Export model string to a text file
writeLines(modelString,con="e2_model.txt")
# Initialize JAGS model
library(rjags)

## Loading required package: coda

## Linked to JAGS 4.3.0

## Loaded modules: basemod,bugs

jagsModel.e2<-jags.model(file="e2_model.txt",data=datalist,n.chains=3,n.adapt=1000)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1
##    Unobserved stochastic nodes: 1
##    Total graph size: 4
## 
## Initializing model

# Start modeling
update(jagsModel.e2,n.iter=1000)
# Get posterior samples
post_e2<-coda.samples(jagsModel.e2,variable.names="theta",
                      n.iter=1000)
# Sort out the posterior samples as a tibble
library(ggmcmc)

## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Loading required package: tidyr

## Loading required package: ggplot2

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

e2.samples<-ggs(post_e2)

The posterior distribution of theta can be plot as follow. We can trim out the estiamtes in the first 200 iterations, as they are regarded as burn-in trials. The mean of posterior distribution of theta is about 0.33 and the MAP of it is about 0.28. These two measures are clost to 0.3=3/10.

library(ggplot2)
library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

dd<-data.frame(theta=c(subset(e2.samples,Chain==1)$value[-(1:200)],
                   subset(e2.samples,Chain==2)$value[-(1:200)],
                   subset(e2.samples,Chain==3)$value[-(1:200)]),
           iteration=rep(201:1000,3),
           chain=as.factor(rep(1:3,each=800)))
fg1<-ggplot(dd,aes(iteration,theta,color=chain))+
   geom_line()
fg2<-ggplot(dd,aes(theta))+
   geom_density()
grid.arrange(fg1,fg2,ncol=2)

# Get the mean of posterior distribution
mean(dd$theta)

## [1] 0.3407185

# Get the MAP of posterior distribution
den.theta<-density(dd$theta)
den.theta$x[which.max(den.theta$y)]

## [1] 0.3126911

Example 3: Difference between Two Rates

Suppose we have two coins produced by two different mints. We flip each of these two coins for n=10 times and respectively observed k1=7 and k2=5 heads. What is the difference between the head rates of the two coins?

In NHST, the null hypothesis is that the difference on the probability of heads between these two coins is equal to 0. Then, the probability of heads for sample 1 is 0.7 and that for sample 2 is 0.5. Although not that appropriate here (i.e., small sample size), the sampling distribution for the difference between these two samples often is assumed to be a normal distribution with the mean as the sample probability difference \(0.7-0.5=0.2\) and the standard error as \(\sqrt{pq(1/n_1+1/n_2)}\), where \(p=(7+5)/20=0.6\) and \(q=1-p=0.4\). Then, Z score is computed as \(0.2/\sqrt{0.6\times0.4\times2/10}=0.91\). Since this value is inside the interval \(\pm1.96\) for two-tailed testing, we cannot reject \(H_o\).

Now we use Bayesian analysis to answer this question. First, we need to prepare the data and establish the JAGS model. Suppose we have no particular prior preference for which value \(\theta\) should be for each coin.

# Prepare data
N<-10
k1<-7
k2<-5
datalist<-list(N=N,k1=k1,k2=k2)
# Set up JAGS model
modelString<-"
model{
   k1~dbin(theta1,N)
   k2~dbin(theta2,N)
   # Prior
   theta1~dbeta(1,1)
   theta2~dbeta(1,1)
   # Difference between two thetas
   delta<-theta1-theta2
}"
writeLines(modelString,con="e3_model.txt")

Then, we compile the JAGS model.

library(rjags)
e3_model<-jags.model(file="e3_model.txt",data=datalist,
                     n.chains=2,n.adapt=1000)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 2
##    Unobserved stochastic nodes: 2
##    Total graph size: 7
## 
## Initializing model

Ask R to update this model for 1,000 iterations. Then, ask JAGS to generate the posterior samples. Finally, transfer the posterior samples to an R object, which can be used latter.

update(e3_model,n.iter=1000)
jags.samples(e3_model,"delta",n.iter=1000)

## $delta
## mcarray:
## [1] 0.1633108
## 
## Marginalizing over: iteration(1000),chain(2)

post_e3<-coda.samples(e3_model,"delta",n.iter=1000)

Now we can transfer the posterior samples to a data frame. Check the first 6 rows of this data structure.

library(ggmcmc)
sample_e3<-ggs(post_e3)
head(sample_e3)

## # A tibble: 6 × 4
##   Iteration Chain Parameter    value
##       <int> <int> <fct>        <dbl>
## 1         1     1 delta      0.431  
## 2         2     1 delta      0.501  
## 3         3     1 delta      0.0107 
## 4         4     1 delta     -0.00423
## 5         5     1 delta      0.0844 
## 6         6     1 delta      0.0168

Since our purpose is to check whether these two \(\theta\)s are different from each other, we can check the estimates of this difference and plot the posterior distribution for the estimated difference between them, namely delta. These relevant estimates can be obtained by the following codes. In fact, the 95% HDI does cover 0 which is exactly predicted in \(H_o\). Thus, we have no sufficient evidence to judge whether we can reject \(H_o\).

library(HDInterval)
library(ggplot2)
dd<-density(sample_e3$value)
# MAP estimate
dd$x[which.max(dd$y)]

## [1] 0.1770039

# mean estimate
mean(sample_e3$value)

## [1] 0.1680485

# HDI
hdi_delta<-hdi(dd)
ggplot(sample_e3,aes(x=value))+
  geom_density()+
  geom_segment(aes(x=hdi_delta[1],y=attr(hdi_delta,"height"),xend=hdi_delta[1],yend=0))+
  geom_segment(aes(x=hdi_delta[2],y=attr(hdi_delta,"height"),xend=hdi_delta[2],yend=0))+
  geom_segment(aes(x=hdi_delta[1],y=attr(hdi_delta,"height"),xend=hdi_delta[2],
                   yend=attr(hdi_delta,"height")),lty=2)+
  annotate("text",x=0.15,y=0.5,label="95%HDI")

What if we got 7 heads in 10 flips for the first coin and 2 heads in 10 flips for the second? We only need to change the data and rerun the model. The 95% HDI clearly does not cover 0. Therefore, we have evidence to say that the probabilities of heads for these two coins are actually different.

N<-10
k1<-7
k2<-2
datalist<-list(N=N,k1=k1,k2=k2)
e3_model2<-jags.model(file="e3_model.txt",data=datalist,
                      n.chains=2,n.adapt=1000)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 2
##    Unobserved stochastic nodes: 2
##    Total graph size: 7
## 
## Initializing model

update(e3_model2,n.iter=1000)
jags.samples(e3_model2,"delta",n.iter=1000)

## $delta
## mcarray:
## [1] 0.4167028
## 
## Marginalizing over: iteration(1000),chain(2)

post_e3_2<-coda.samples(e3_model2,"delta",n.iter=1000)
sample_e3_2<-ggs(post_e3_2)
hdi(density(sample_e3_2$value))

##      lower      upper 
## 0.06477277 0.76101787 
## attr(,"credMass")
## [1] 0.95
## attr(,"height")
## [1] 0.3047339

Prior and Posterior Predictive Distributions

Exercise 3.4.4 in the textbook “Bayesian Cognitive Modeling” is an intriguing case. An undergraduate student in Nedland surveyed 120 elder adults in nursing homes. Twenty-four out of them indicated that they had been bullied by their fellow residents. This student rejected the suggestion that her study may have been too small to draw reliable conclusions: “If I had talked to more people, the result would have changed by one or two percent at the most”. Is this student correct?

The first thing we need to think is what parameter we are going to estimate. Here, the probability of being bullied \(\theta\). Since the answer to the survey is either yes or no, the binomial distribution can be a suitable likelihood function. We have the data of 24 yes answers out of 120 answers. We can estimate the parameter \(\theta\), the rate of being bullied.

N<-120
k<-24
datalist<-list(N=N,k=k)
modelString<-"
model{
   # Likelihood function
   k~dbin(theta,N)
   
   # Prior
   theta~dbeta(1,1)
   thetaprior~dbeta(1,1)
   
   # Prior predictive distribution
   priork~dbin(thetaprior,N)
   
   # Posterior predictive districution
   postk~dbin(theta,N)
}"
writeLines(modelString,con="e4_model.txt")
e4_model<-jags.model(file="e4_model.txt",data=datalist,
                     n.chains=2,n.adapt=1000)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1
##    Unobserved stochastic nodes: 4
##    Total graph size: 7
## 
## Initializing model

update(e4_model,n.iter=1000)
post_e4<-coda.samples(e4_model,c("theta","priork","postk"),n.iter=1000)
sample_e4<-ggs(post_e4)
head(sample_e4)

## # A tibble: 6 × 4
##   Iteration Chain Parameter value
##       <int> <int> <fct>     <dbl>
## 1         1     1 postk        22
## 2         2     1 postk        22
## 3         3     1 postk        39
## 4         4     1 postk        29
## 5         5     1 postk        24
## 6         6     1 postk        30

In this model, two new variables are introduced. One is used to represent the prior distribution of the yes answers according to the prior distribution of \(\theta\), priork~dbin(thetaprior,N). This distribution is the predictive for observations (i.e., the count of yes answers). Thus, it is called prior predictive distribution. Similarly, the posterior predictive distribution is deined by postk~dbin(theta,N). Note the variable theta is updated by MCMC, whereas the variable thetaprior is not. This is how we can distinguish the prior and posterior predictive distributions.

We first check the mean estimate of \(\theta\). Then, we can compare the prior and posterior predictive distributions. Apparently, there is no particular peek on the prior predictive distribution. This is reason, as it is assumed to have zero knowledge about the rate of being bullied. However, there is a clear peek on the posterior predictive distribution, which is near 24.

theta<-filter(sample_e4,Parameter=="theta")$value
mean(theta)

## [1] 0.2048847

ggplot(subset(sample_e4,Parameter!="theta"),aes(x=value))+
  geom_density(aes(color=Parameter))

Now we turn back to the second thing we need to consider: how to judge the student’s argument. Since she claimed that even increasing the sample size, the result won’t get too much different. We can ask JAGS to generate the posterior sample with N>120. To this end, we adjust a bit the current model. Note here we are going to ask JAGS to generate the posterior predictive samples with four different sample sizes.

N<-120
N1<-c(120,200,500,1000)
k<-24
datalist<-list(N=N,N1=N1,k=k,m=length(N1))
modelString<-"
model{
   k~dbin(theta,N)
   for(i in 1:m){
      postk[i]~dbin(theta,N1[i])
   }
   # Prior
   theta~dbeta(1,1)
}"
writeLines(modelString,con="e4_model2.txt")

Start modeling with this new model.

e4_model2<-jags.model(file="e4_model2.txt",data=datalist,
                      n.chains=2,n.adapt=100)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1
##    Unobserved stochastic nodes: 5
##    Total graph size: 13
## 
## Initializing model

update(e4_model2,n.iter=1000)
post_e4<-coda.samples(e4_model2,c("theta","postk"),n.iter=1000)
sample_e4<-ggs(post_e4)

Let’s check the mean estimate of \(\theta\). Of mose interest is how many yes answers will be generated with the posterior mean of \(\theta\). The percentage of bullied cases is pretty much the same in all these situations. The mean difference of these percentages is not larger than 2%.

theta<-filter(sample_e4,Parameter=="theta")$value
mean(theta)

## [1] 0.2022089

postk<-sapply(1:4,function(x){
  temp<-paste0("postk[",x,"]")
  v<-filter(sample_e4,Parameter==temp)$value
  return(v)
})
dim(postk)

## [1] 2000    4

colMeans(postk)

## [1]  24.2500  40.3260 101.2110 202.2445

rates<-colMeans(postk)/N1
rates

## [1] 0.2020833 0.2016300 0.2024220 0.2022445

ratesm<-abs(matrix(rates,4,4)-matrix(rates,4,4,byrow=T))
ratesm

##              [,1]         [,2]         [,3]         [,4]
## [1,] 0.0000000000 0.0004533333 0.0003386667 0.0001611667
## [2,] 0.0004533333 0.0000000000 0.0007920000 0.0006145000
## [3,] 0.0003386667 0.0007920000 0.0000000000 0.0001775000
## [4,] 0.0001611667 0.0006145000 0.0001775000 0.0000000000

mean(ratesm[lower.tri(ratesm)])

## [1] 0.0004228611

Bayesian Data Analysis

Lee-Xieng Yang

Example 2: Use Beta Distribution as Prior Distribution

Example 3: Difference between Two Rates

Prior and Posterior Predictive Distributions