Bayesian Data Analysis

Repeated Measurement

Suppose you measured 3 participants’ IQs with 3 parallel IQ tests. The test scores were (90, 95, 100), (105, 110, 115), and (150, 155, 160) respectively for them. Since these tests are parallel tests, they can be treated as the equivalent testing instruments. Thus, the measuring noise (\(\sigma\)) should be the same among them and each participant should get a similar score among them - the reliability issue. What are the estimates of these 3 participants’ IQs? What is the estimate of the test precision?

As usual, we start from establishing a model with a suitable likelihood function to deal with this problem. Since the scores on an IQ test are always normally distributed, the likelihood function can be the normal distribution. Since these are parallel tests, this question can be regarded as measuring each participant’s IQ for three times for estimating the true IQ score of each participant. Let’s start from preparing the data.

Prepare Data

K<-matrix(c(90,95,100,
            105,110,115,
            150,155,160),3,3,byrow=T)
datalist<-list(K=K,nsubj=dim(K)[1],ntest=dim(K)[2])

Set up model

We set up the model as follows. It is assumed that the prior distribution of the mean IQ is a uniform distribution between 0 and 300. The prior distribution of precision is assumed to be a Gamma distribution with the two parameters both as 0.001.

modelString<-"
model{
   for(j in 1:ntest){
      for(i in 1:nsubj){
         K[i,j]~dnorm(mu[i],lambda)
      }
   }
   # Prior
   # mu
   for(i in 1:nsubj){
      mu[i]~dunif(0,300)
   }
   # lambda
   lambda~dgamma(0.001,0.001)
   sigma<-pow(lambda,-0.5)
}"
writeLines(modelString,con="e7_model.txt")

Compile the model.

library(rjags)

## Loading required package: coda

## Linked to JAGS 4.3.0

## Loaded modules: basemod,bugs

e7_model<-jags.model(file="e7_model.txt",data=datalist,
                     n.chains=2,n.adapt=100)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 9
##    Unobserved stochastic nodes: 4
##    Total graph size: 21
## 
## Initializing model

update(e7_model,n.iter=10000,thin=10)
post_e7<-coda.samples(e7_model,c("mu","sigma"),n.iter=10000,thin=10)

Transfer the posterior samples transferred to the object in the format of tibble.

library(ggmcmc)

## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Loading required package: tidyr

## Loading required package: ggplot2

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

sample_e7<-ggs(post_e7)
sample_e7

## # A tibble: 8,000 × 4
##    Iteration Chain Parameter value
##        <int> <int> <fct>     <dbl>
##  1         1     1 mu[1]      98.9
##  2         2     1 mu[1]      92.7
##  3         3     1 mu[1]      97.6
##  4         4     1 mu[1]      94.9
##  5         5     1 mu[1]      93.6
##  6         6     1 mu[1]      92.2
##  7         7     1 mu[1]      99.0
##  8         8     1 mu[1]      93.5
##  9         9     1 mu[1]      97.4
## 10        10     1 mu[1]      93.2
## # … with 7,990 more rows

Now we can plot the posterior distributions of the estimated IQs for the three participants.

library(ggplot2)
# Get the posterior estimates of each participant's IQ
IQ<-sapply(1:3,function(x){
     temp<-paste0("mu[",x,"]")
     mu<-filter(sample_e7,Parameter==temp)$value
     return(mu)
})
library(reshape)

## 
## Attaching package: 'reshape'

## The following objects are masked from 'package:tidyr':
## 
##     expand, smiths

## The following object is masked from 'package:dplyr':
## 
##     rename

IQ.dta<-melt(IQ)
names(IQ.dta)<-c("Iteration","ID","IQ")
ggplot(IQ.dta,aes(x=IQ))+
  geom_density(aes(color=as.factor(ID)))+
  xlim(60,200)

The MAP estimates of the IQs of these participants can be computed by the following codes. Also, the mean estimates of the IQs can be computed.

MAP.IQs<-apply(IQ,2,function(x){
     temp<-density(x)
     return(temp$x[which.max(temp$y)])
})
MAP.IQs

## [1]  95.03181 109.85240 154.05901

colMeans(IQ)

## [1]  95.02432 110.04101 154.94446

Also, we can check for the estimated precision. You can check for the posterior distribution of the estimated \(\sigma\)s. It is not like a normal distribution. This is not surprising, as the sampling distribution of \(\sigma\) is not normal.

sigma<-filter(sample_e7,Parameter=="sigma")$value
sigma.den<-density(sigma)
sigma.den$x[which.max(sigma.den$y)]

## [1] 4.877391

mean(sigma)

## [1] 5.747861

ggplot(data.frame(sigma=sigma),aes(sigma))+
  geom_density()

Autocorrelation for IQs

The autocorrelation between the posterior \(\mu\)s is quite low on each chain for each participant. The autocorrelation is quite low now.

r.IQ<-sapply(1:3,function(i){
    temp<-subset(IQ.dta,ID==i)
    r1<-with(temp,cor(IQ[1:999],IQ[2:1000]))
    r2<-with(temp,cor(IQ[1001:1999],IQ[1002:2000]))
    return(c(r1,r2))
})
r.IQ

##              [,1]        [,2]         [,3]
## [1,]  0.026977249  0.04685903 -0.039696415
## [2,] -0.005210466 -0.02309280  0.005447951

Autocorrelation for Sigma

The autocorrelation between the posterior \(\sigma\) is also low with respect to no matter which chain.

cor.test(sigma[1:999],sigma[2:1000])
cor.test(sigma[1001:1999],sigma[1002:2000])

Estimation of Mean and Standard Deviation of Normal Distribution

In Chapter 16 (Kruschke, 2016), Kruschke suggested to use a normal prior on \(\mu\) and a uniform prior on \(\sigma\) when estimating the mean and standard deviation of normal distribution.

Also, he suggested to use the data themselves to tell us what the typical scale of the data is. Let’s see how it can be done.

Prepare Data

Create a sample of 100 data points randomly drawn from a normal distribution with \(\mu=0\) and \(\sigma=1\).

set.seed(123)
n<-100
K<-rnorm(n,0,1)
datalist<-list(n=n,K=K,mean=mean(K),sd=sd(K))

Set up Model

The model can be set up as follows.

modelString<-"
model{
  for(i in 1:n){
    K[i]~dnorm(mu,1/sigma^2)
  }
  # Priors
  mu~dnorm(mean,1/(100*sd)^2)
  sigma~dunif(sd/1000,sd*1000)
}"
writeLines(modelString,con="e8_model.txt")

Start doing modeling with this model. Get the posterior samples.

e8_model<-jags.model(file="e8_model.txt",data=datalist,
                     n.chains=2,n.adapt=100)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 100
##    Unobserved stochastic nodes: 2
##    Total graph size: 116
## 
## Initializing model

update(e8_model,n.iter=1000)
post_e8<-coda.samples(e8_model,c("mu","sigma"),n.iter=1000)
sample_e8<-ggs(post_e8)

Let’s check for the estimated \(\mu\) and \(\sigma\). The estimates of \(\mu\) and \(\sigma\) are actually quite close to the mean and sd of the sample. This is not surprising.

mu<-filter(sample_e8,Parameter=="mu")$value
sigma<-filter(sample_e8,Parameter=="sigma")$value
mu.den<-density(mu)
mu.den$x[which.max(mu.den$y)]

## [1] 0.07451739

mean(mu)

## [1] 0.08989352

sigma.den<-density(sigma)
sigma.den$x[which.max(sigma.den$y)]

## [1] 0.9220473

mean(sigma)

## [1] 0.9243503

Let’s try another example. First we create a population of N = 10,000 from a standard normal distribution. Second, we create the sampling distribution for the means of 100 samples randomly drawn from this population. As you can see, the mean of this sampling distribution is close to the mean of population and the sd of this sampling distribution is about 1/10 of the sd of the population. Of course, this is because the sd of a sampling distribution is called standard error, which is \(\frac{\sigma}{\sqrt{n}}\).

set.seed(123)
p<-rnorm(10000,0,1)
# Sampling distribution for sample size = 100
set.seed(123)
means<-sapply(1:1000,function(x){
  temp<-sample(p,100,replace=T)
  return(mean(temp))
})
mean(means)

## [1] 0.000464011

sd(means)

## [1] 0.0991933

Third, suppose we randomly get a sample of 100 values from the population. How do we check whether the sample mean is about 0?

n<-100
set.seed(1234)
K<-sample(p,100,replace=F)
datalist<-list(n=n,K=K,mean=mean(K),sd=sd(K))
e8_model1<-jags.model(file="e8_model.txt",data=datalist,
                      n.chains=2,n.adapt=100)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 100
##    Unobserved stochastic nodes: 2
##    Total graph size: 116
## 
## Initializing model

update(e8_model1,n.iter=1000)
post1_e8<-coda.samples(e8_model1,c("mu","sigma"),n.iter=1000)

The posterior estimates of these two parameters can be checked by the following codes. Note the mean of the estimated means is close to the data mean. Also, the mean of the estimated sd is close to the data sd. However, the sd of the posterior distribution of sigma is about 1/10 of the sd of population, matching the equation that standard deviation of the sampling distribution is \(\frac{\sigma}{\sqrt{n}}\). Thus, the posterior distribution of estimated means is actually the sampling distribution of means. In this distribution, we have 2,000 means, which are generated by our model in order to maximize the likelihoods to predict each of the 100 sample data points.

sample1_e8<-ggs(post1_e8)
mu<-filter(sample1_e8,Parameter=="mu")$value
sigma<-filter(sample1_e8,Parameter=="sigma")$value
# Check the mean of posterior samples for mean
mean(mu)
# The mean of data
datalist$mean
# Check the mean of posterior samples for sd
mean(sigma)
# The sd of data
datalist$sd
# The sd of the posterior distribution of means
sd(mu)

Again, how about estimating the mean of a small sample?

set.seed(1234)
samples<-sapply(1:1000,function(x){
   temp<-sample(p,10,replace=T)
   return(mean(temp))
})
mean(samples)

## [1] 0.0001896108

sd(samples)

## [1] 0.3127555

n<-10
K<-sample(p,10,replace=F)
datalist<-list(n=n,K=K,mean=mean(K),sd=sd(K))

Let’s run modeling again.

e8_model2<-jags.model(file="e8_model.txt",data=datalist,
                      n.chains=2,n.adapt=100)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 10
##    Unobserved stochastic nodes: 2
##    Total graph size: 26
## 
## Initializing model

update(e8_model2,n.iter=1000)
post2_e8<-coda.samples(e8_model2,c("mu","sigma"),n.iter=1000)

Now we can check the posterior distributions of these two estimates. The estimation for the mean is still pretty good. However, the estimation for the sd is not that good. As you can see, the sd of the sampling distribution is about \(\frac{1}{\sqrt{10}}\)sd of the population. The posterior distribution of the means is the sampling distribution. However, the sd of this posterior distribution of means is smaller than the true one (i.e., 0.313). Recall in the statistics textbooks, Z distribution is not suitable for doing statistical test when small sample is size.

sample2_e8<-ggs(post2_e8)
mu<-filter(sample2_e8,Parameter=="mu")$value
sigma<-filter(sample2_e8,Parameter=="sigma")$value
c(mean(mu),datalist$mean)

## [1] -0.3107163 -0.3183128

c(mean(sigma),datalist$sd)

## [1] 0.8011664 0.6819605

fg1<-ggplot(data.frame(mu),aes(mu))+geom_density()
fg2<-ggplot(data.frame(sigma),aes(sigma))+geom_density()
library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

grid.arrange(fg1,fg2,ncol=2)

# SD of population
sd(p)

## [1] 0.9986366

# SD of sampling distribution
sd(samples)

## [1] 0.3127555

# SD of population divided by squared root of sample size = 10
sd(p)/sqrt(10)

## [1] 0.3157966

# SD of posterior distribution of means
sd(mu)

## [1] 0.26791

Outliers and Robust Estimation: The t Distribution

When data appear to have outliers beyond what would be accommodated by a normal distribution, it would be useful to be able to describe the data with a more appropriate distribution that has taller or heavier tails than the normal.

The t distribution was originally invented by Gosset (1908). There are three parameters for the t distribution: \(\mu\), \(\sigma\), and \(\nu\) (degrees of freedom or normality).

Two Groups

The data file “TwoGroupIQ.csv” contains the IQ scores measured for two groups of participants who took the smart drug or the placebo. Whether or not the smart drug can effectively increase our IQ is the emphasis of concern.

Import Data

First, we extract data from “TwoGroupIQ.csv”. In this data set, there are 57 participants in the placebo group and 63 participants in the smart-drug group. As shown in the histogram, it looks like no difference between these two groups on the IQ score.

dta<-read.csv("TwoGroupIQ.csv",header=T)
ggplot(data=dta,aes(Score,fill=Group))+
  geom_histogram(col="white",alpha=0.6,bins=30,position="identity")

Prepare Data

We need to turn the imported data to a matrix which can be read by JAGS. Note a data frame is not acceptable by JAGS.

dta$g<-ifelse(dta$Group=="Placebo",1,2)
datalist<-list(K=as.matrix(dta[,c(1,3)]),N=dim(dta)[1],
               mean=with(dta,tapply(Score,g,mean)),
               sd=with(dta,tapply(Score,g,sd)))

Set up JAGS model

The below codes show how to set up a JAGS model for this case. Now the likelihood function is a t distribution, in which three parameters are needed to estimate. Note mu[K[i,2]] is either mu[1] or mu[2], as in K, the second column is either 1 or 2. Following the suggestion of Kruschke, the mean IQ of each sample has a normal distribution with the sample means as the means and a very large standard deviation. Also, the standard deviation of each sample has a uniform prior from an extremely small value to an extremely large value. Normally, the prior distribution of the degrees of freedom of t is an exponential distribution. Since, the degrees of freedom should be at least 1 and the minimal value of the exponential distribution is 0, the prior of \(\tau\) is computed by adding 1 to the sampled value from the exponential distribution.

modelString<-"
model{
   for(i in 1:N){
      K[i,1]~dt(mu[K[i,2]],1/sigma[K[i,2]]^2,tau)
   }
   # Priors
   for(j in 1:2){
     mu[j]~dnorm(mean[j],1/(100*sd[j])^2)
     sigma[j]~dunif(sd[j]/1000,1000*sd[j])
   }
   tau<-nutau+1
   nutau~dexp(1/29)
}"
writeLines(modelString,con="e9_model.txt")

Start Modeling with JAGS

Now we can start modeling. Also, we transfer the output by coda.samples( ) to a tibble structure.

e9_model<-jags.model(file="e9_model.txt",
                      data=datalist,n.chains=2,
                      n.adapt=100)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 120
##    Unobserved stochastic nodes: 5
##    Total graph size: 271
## 
## Initializing model

update(e9_model,1000)
post_e9<-coda.samples(e9_model,
              variable.names=c("mu","sigma","tau"),
              n.iter=1000)
sample_e9<-ggs(post_e9)

Let’s check out the posterior samples of these three parameters. The estimated means are quite close to the sample means. The estimated sds are less close to the sample sds.

# Get the estimated means of these two groups
mus<-sapply(1:2,function(x){
      temp<-paste0("mu[",x,"]")
      return(filter(sample_e9,Parameter==temp)$value)
})
colMeans(mus)

## [1]  99.25828 107.18556

sigmas<-sapply(1:2,function(x){
  temp<-paste0("sigma[",x,"]")
      return(filter(sample_e9,Parameter==temp)$value)
})
colMeans(sigmas)

## [1] 11.27413 17.89749

tau<-filter(sample_e9,Parameter=="tau")$value
mean(tau)

## [1] 3.831099

We can check the sampling distributions of these two means as well as that of these two sds. You probably will find that the two distributions of the means now are more distant to each other.

para.dta<-data.frame(mu=c(mus),g=rep(c("Placebo","Smart Drug"),each=2000),
                   sd=c(sigmas))
fg1<-ggplot(para.dta,aes(mu,fill=g))+
  geom_histogram(color="white",alpha=0.6,bins=30,position="identity")
fg2<-ggplot(para.dta,aes(sd,fill=g))+
  geom_histogram(color="white",alpha=0.6,bins=30,position="identity")
grid.arrange(fg1,fg2,ncol=2)

We can check the sampling distribution of the difference between these two group means. The 95% HDI does not cover 0. That is, these two group means are actually different from each other. However, if we do NHST with \(H_o:\Delta\mu=0\) and \(H_1:\Delta\mu\neq0\), the result is not significant. Why? This is because that in NHST, normality is assumed. Then, for a sample with outliers, the variance will be too large, hence becoming harder to get a significant result.

diffmu.dta<-data.frame(delta=mus[,2]-mus[,1])
library(HDInterval)
x<-hdi(density(diffmu.dta$delta))
fg3<-ggplot(diffmu.dta,aes(delta))+
  geom_density()
fg3+geom_segment(x=x[1],y=attr(x,"height"),xend=x[2],yend=attr(x,"height"))+
  annotate("text",x=7.5,y=0.03,label="95% HDI")

with(dta,t.test(Score~Group,var.equal=T))

## 
##  Two Sample t-test
## 
## data:  Score by Group
## t = -1.9249, df = 118, p-value = 0.05665
## alternative hypothesis: true difference in means between group Placebo and group Smart Drug is not equal to 0
## 95 percent confidence interval:
##  -15.8369851   0.2246208
## sample estimates:
##    mean in group Placebo mean in group Smart Drug 
##                 100.0351                 107.8413

Bayesian Data Analysis

Lee-Xieng Yang

Repeated Measurement

Prepare Data

Set up model

Autocorrelation for IQs

Autocorrelation for Sigma

Estimation of Mean and Standard Deviation of Normal Distribution

Prepare Data

Set up Model

Outliers and Robust Estimation: The t Distribution

Two Groups

Import Data

Prepare Data

Set up JAGS model

Start Modeling with JAGS