MCMC

For a long time, researchers could only proceed easily with Bayesian inference with the posterior that was available in closed form or as an analytic expression. This situation changed dramatically with the advent of computer-driven sampling methodology generally known as Markov Chain Monte Carlo (MCMC; e.g., Gamerman & Lopes, 2006; Gilks, Richardson, & Spiegelhalter, 1996). Using MCMC techniques such as Gibbs sampling or the Metropolis-Hasting algorithm, researchers can directly sample sequences of values from the posterior distribution of interest. For a simple illustration of Bayesian inference using MCMC, consider the binomial example of 8 correct responses out of 10 questions, and the associated inference problem for \(\theta\), the rate of answering questions correctly.

The below R codes illustrate how we use R to run Bayesian inference for the model’s parameter. In this case, the correct rate \(\theta\) is the parameter. Since there are only two outcomes for each question, either correct or wrong, the binomial distribution can be a suitable likelihood function. The procedure basically consists of a few steps. First, establish your data. In order to fit the requirement of JAGS, any data fed into JAGS should be a list.

# Set up sample size N=10, k=8 (k: correct number)
N<-10
k<-8
datalist<-list(N=N,k=k)

Second, we use R to compose a JAGS model, in which the prior distribution for the parameter(s) of interest, the likelihood function, and the data should be properly fed in the lines. The model is actually a long string, which can be exported as a file for JAGS to read. In this model, k ~ dbin(theat,N) is a JAGS code, describing that generating k correct ones out of N questions from a binomial distribution with the probability of success = \(\theta\). The function dbin( ) returns the probability density of the binomial distribution with \(\theta\) and N as arguments. The function theat ~ dunif(0,1) defines the prior distribution as a uniform distribution with the values in between 0 and 1. Obviously, this means that we have no priori knowledge about \(\theta\).

# Set up your JAGS model 
modelString<-"
model{
   k~dbin(theta,N)
   # Prior
   theta~dunif(0,1)
}"
# Save your model as a file
writeLines(modelString,con="e1_model.txt")

Third, set up the model by JAGS. Here we use jags.model( ) to set up the model, which is a variable here. The first argument in this function defines the file name which JAGS should import. The second argument is called data for the collected data. The third argument n.chains sets up how many chains we are gonna use when doing sequential updating by MCMC. The argument n.adapt means the number of iterations for adaption. This is used to tune the parameters of the algorithm running MCMC and it is non-Markovian. Detailed introduction can be found on this webpage.

library(rjags)
## Loading required package: coda
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs
jagsModel.e1<-jags.model(file="e1_model.txt",
              data=datalist,n.chains=3,n.adapt=1000)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1
##    Unobserved stochastic nodes: 1
##    Total graph size: 5
## 
## Initializing model

Fourth, start updating. We use the function update( ) to update and re-fit the JAGS model. Then we use jags.samples( ) to generate posterior samples from the posterior distribution of the parameters of a JAGS model in 1000 iterations. Finally, we use coda.samples( ) to generate posterior samples in mcmc.list format.

# Update and re-fit the model
update(jagsModel.e1,1000)
# Generate posterior distribution
post_e1<-coda.samples(jagsModel.e1,variable.names="theta",
            n.iter=1000)

When you finish updating the parameter, you can use ggs( ) to turn the object returned by coda.samples( ) to a tibble structure. A tibble structure is just like the data frame. You can check the numbers of its columns and rows, which are 3000 and 4. As we used 3 chains and for each of which there were 1000 iterations, in total there are 3000 values sampled from the posterior distribution.

library(ggmcmc)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: tidyr
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
e1.samples<-ggs(post_e1)

Now we can plot the tracks of the three chains by iterations. Also, we can portrait the mean track by connecting the means of the three chains all the way to the final iteration. In order to make the track easier to read, I connect the means for every 15 iterations. Often, the fist n iterations will be regarded as burn-in cases, which should be discarded. Here, I just set up the first 200 iterations as the burn-in cases. As shown in this figure, the estimated \(\theta\) is getting close to 0.8.

library(ggplot2)
fg1<-ggplot(e1.samples,aes(x=Iteration,y=value))+
       geom_line(aes(color=as.factor(Chain)))
dd<-cbind(e1.samples$value[1:1000],e1.samples$value[1001:2000],e1.samples$value[2001:3000])
meanTheta.dta<-data.frame(Iteration=1:1000,value=rowMeans(dd))
fg1+geom_line(data=meanTheta.dta[seq(201,1000,15),],aes(x=Iteration,y=value),color="black")+
  annotate("text",x=110,y=0.76,label="Burn-in")

We can of course plot the histogram of these sampled \(\theta\) values. Because the probability of any value appearing in the sequence of posterior samples is decided by its relative posterior probability, this histogram is an approximation to the posterior distribution of \(\theta\). As you can see in this figure, the mode is around 0.8. That is, the MAP estimate of \(\theta\) is 0.8. The mean estimate of \(\theta\) is around 0.75.

ggplot(e1.samples,aes(x=value))+
  geom_density()

mean(e1.samples$value)
## [1] 0.7466938

Exercise in class: You can try with other correct numbers (e.g., 3). Or you can change the prior to a Beta distribution with the two parameters both as 1, which is a uniform like distribution.

Highest Density Interval

In fact, we don’t really need to compute the confidence interval for Bayesian statistics. In Bayesian statistics, we can use 95% HDI (Highest Density Interval). We can use the package {HDInterval} developed by John Kruschke to compute this interval.

allvalues<-density(e1.samples$value)
library(HDInterval)
hdiD1<-hdi(allvalues)
ggplot(e1.samples,aes(x=value))+
  geom_density()+
  geom_segment(aes(x=hdiD1[1],y=attr(hdiD1,"height"),
              xend=hdiD1[2],yend=attr(hdiD1,"height")),lty=2,color="tomato")

Inferring a common rate

Suppose two coins were made in the same mint. You tossed each of them for 10 times and observed 5 heads for one coin and 7 heads for another. What is your estimation of the probability of head for the coin made by that mint?

First, we can create the data according to this question.

k<-c(5,7)
N<-c(10,10)
datalist<-list(k=k,N=N,m=2)

Second, we need to write up a suitable JAGS model. Suppose we have no prior knowledge about the probability of head.

modelString<-"
model{
    for(i in 1:m){
       k[i]~dbin(theta,N[i])
    }
    # Prior
    theta~dbeta(1,1)
}"
writeLines(modelString,con="e2_model.txt")

Third, start modeling with two chains in MCMC updating and 1000 iterations.

e2.model<-jags.model(file="e2_model.txt",
                     data=datalist,n.chains=2,n.adapt=1000)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 2
##    Unobserved stochastic nodes: 1
##    Total graph size: 7
## 
## Initializing model
update(e2.model,n.iter=1000)
jags.samples(e2.model,"theta",n.iter=1000)
## $theta
## mcarray:
## [1] 0.5943917
## 
## Marginalizing over: iteration(1000),chain(2)
post_e2<-coda.samples(e2.model,variable.names="theta",
            n.iter=1000)

Now, we can get the posterior samples for \(\theta\). The MAP estimate for \(\theta\) is about 0.59 and the mean of these posterior samples is about 0.58. The lower and upper bounds of the 95% HDI are about 0.38 and 0.79.

e2.samples<-ggs(post_e2)
de2<-density(e2.samples$value)
MAP.theta.e2<-de2$x[which.max(de2$y)]
MAP.theta.e2
## [1] 0.5978566
mean(e2.samples$value)
## [1] 0.593884
hdiD2<-hdi(de2)
hdiD2
##     lower     upper 
## 0.3967771 0.7888821 
## attr(,"credMass")
## [1] 0.95
## attr(,"height")
## [1] 0.6335845