A brief and practical introduction to fitting Bayesian multilevel models in R and Stan
Using brms (Bayesian Regression Models using Stan)
Quick intro to Bayesian inference
Mostly practical skills
Minimal prerequisites
Know what mixed-effects or multilevel model is
A little experience with stats and/or data science in R
Vague knowledge of what Bayesian stats are
Advanced prerequisites
Knowing about the lme4 package will help
Knowing about tidyverse and ggplot2 will help
How to follow the course
Slides and text version of lessons are online
Fill in code in the worksheet (replace ... with code)
You can always copy and paste code from text version of lesson if you fall behind
Conceptual learning objectives
At the end of this course, you will understand …
The basics of Bayesian inference
What a prior, likelihood, and posterior are
The basics of how Markov Chain Monte Carlo works
What a credible interval is
Practical learning objectives
At the end of this course, you will be able to …
Write brms code to fit a multilevel model with random intercepts and random slopes
Diagnose and deal with convergence problems
Interpret brms output
Compare models with LOO information criteria
Use Bayes factors to assess strength of evidence for effects
Make plots of model parameters and predictions with credible intervals
What is Bayesian inference?
What is Bayesian inference?
A method of statistical inference that allows you to use information you already know to assign a prior probability to a hypothesis, then update the probability of that hypothesis as you get more information
Used in many disciplines and fields
We’re going to look at how to use it to estimate parameters of statistical models to analyze scientific data
Powerful, user-friendly, open-source software is making it easier for everyone to go Bayesian
Bayes’ Theorem
Thomas Bayes, 1763
Pierre-Simon Laplace, 1774
Bayes’ Theorem
\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]
How likely an event is to happen based on our prior knowledge about conditions related to that event
The conditional probability of an event A occurring, conditioned on the probability of another event B occurring
Bayes’ Theorem
\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]
The probability of A being true given that B is true (\(P(A|B)\))
is equal to the probability that B is true given that A is true (\(P(B|A)\))
times the ratio of probabilities that A and B are true (\(\frac{P(A)}{P(B)}\))
Bayes’ theorem and statistical inference
Let’s say \(A\) is a statistical model (a hypothesis about the world)
How probable is it that our hypothesis is true?
\(P(A)\): prior probability that we assign based on our subjective knowledge before we get any data
Bayes’ theorem and statistical inference
We go out and get some data \(B\)
\(P(B|A)\): likelihood is the probability of observing that data if our model \(A\) is true
Use the likelihood to update our estimate of probability of our model
\(P(A|B)\): posterior probability that model \(A\) is true, given that we observed \(B\).
Bayes’ theorem and statistical inference
\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]
What about \(P(B)\)?
marginal probability, the probability of the data
Basically just a normalizing constant
If we are comparing two models with the same data, the two \(P(B)\)s cancel out
Restating Bayes’ theorem
\[P(model|data) \propto P(data|model)P(model)\]
\[posterior = likelihood \times prior\]
what we believed before about the world (prior) × how much our new data changes our beliefs (likelihood) = what we believe now about the world (posterior)
Example
Find a coin on the street. What is our prior estimate of the probability of flipping heads?
Now we flip 10 times and get 8 heads. What is our belief now?
Probably doesn’t change much because we have a strong prior and the likelihood of probability = 0.5 is still high enough even if we see 8/10 heads
Shady character on the street shows us a coin and offers to flip it. He will pay $1 for each tails if we pay $1 for each heads
What is our prior estimate of the probability?
He flips 10 times and gets 8 heads. What’s our belief now?
In classical “frequentist” analysis we cannot incorporate prior information into the analysis
In each case our point estimate of the probability would be 0.8
Bayes is computationally intensive
\(P(data|model)\), the likelihood, is needed to get \(P(model|data)\), the posterior
But the “model” is not just one parameter, it might be 100s or 1000s of parameters
Need to integrate a probability distribution with 100s or 1000s of dimensions
For many years, this was computationally not possible
Markov Chain Monte Carlo (MCMC)
Class of algorithms for sampling from probability distributions
The longer the chain runs, the closer it gets to the true distribution
In Bayesian inference, we run multiple Markov chains for a preset number of samples
Discard the initial samples (warmup)
What remains is our estimate of the posterior distribution
Hamiltonian Monte Carlo (HMC) and Stan
HMC is the fastest and most efficient MCMC algorithm that has ever been developed
It’s implemented in software called Stan
What is brms?
An easy way to fit Bayesian mixed models using Stan in R
Syntax of brms models is just like lme4
Runs a Stan model behind the scenes
Automatically assigns sensible priors and does lots of tricks to speed up HMC convergence
Why use Bayes?
Some models just can’t be fit with frequentist maximum-likelihood methods
Estimate how big effects are instead of yes-or-no framework of rejecting a null hypothesis
We can say “the probability of something being between a and b is 95%” instead of “if we ran this experiment many times, 95% of the confidence intervals would contain this value.”
seed sets a random seed for reproducibility, and file creates a .rds file in the working directory so you can reload the model later without rerunning.
summary(fit_fixedslopes)
Low Rhat (the model converged)
Posterior distribution mass for fixed effects is well above zero
sigma (SD of residuals) is smaller than before because we’re explaining more variation
Modifying priors
prior_summary() shows what priors were used to fit the model
prior_summary(fit_fixedslopes)
t-distributions on intercept, random effect SD, and residual SD (sigma)
mean of intercept prior is the mean of the data
mean of the variance parameters is 0 but lower bound is 0 (half bell curves)
Priors on fixed effect slopes
By default they are flat
Assigns equal prior probability to any possible value
0 is as probable as 100000 which is as probable as -55555, etc.
Not very plausible
It is OK in this case because the model converged, but often it helps convergence to use priors that only allow “reasonable” values
Refitting with reasonable fixed-effect priors
normal(0, 5) is a good choice
Mean of 0 means we think that positive effects are just as likely as negative
SD of 5 means we are still assigning pretty high probability to large effect sizes
Use prior() to assign a prior to each class of parameters
fit_fixedslopes_priors <-brm(yield ~1+ variety + soilN + (1| field),data = yield_data,prior =c(prior(normal(0,5), class = b) ),chains =4, iter =2000, warmup =1000,seed =811, file ='fit_fixedslopes_priors')
summary(fit_fixedslopes_priors)
Basically no effect on the results or the performance of the HMC sampler
But it’s something to be mindful of in the future!
Posterior predictive check
pp_check() is a useful diagnostic for how well the model fits the data
pp_check(fit_fixedslopes_priors)
black line: density plot of observed data
blue lines: density plot of predicted data from 10 random draws from the posterior distribution
Because these are simulated data, it looks very good in this case
Plotting posterior estimates
summary() only gives us the median and 95% credible interval
We can work with the full uncertainty distribution (nothing special about 95%)
Functions from tidybayes used to make tables and plots
gather_draws() makes a data frame from the posterior samples of parameters that we choose
Customized plots can be made with packages emmeans and tidybayes
emmeans() gives you the expected value at specific levels of fixed predictors
variety_emmeans <-emmeans(fit_soilNrandomslope, ~ variety)gather_emmeans_draws(variety_emmeans) %>%ggplot(aes(x = .value, y = variety)) +stat_interval(.width =c(.66, .95, .99)) +stat_summary(fun = median, geom ='point', size =2) +scale_color_brewer(palette ='Greens') +ggtitle('posterior expected value by variety', 'averaged across soil N and rainfall')
add_epred_draws() gives you the expected value at levels of categorical and continuous fixed effects
Trend of yield versus soil N for each sex, at average value of rainfall
Averages across field-level random effects
expand_grid(rainfall =mean(yield_data$rainfall),soilN =seq(2, 9, by = .5),variety =c('short', 'tall')) %>%add_epred_draws(fit_soilNrandomslope, re_formula =~0) %>%ggplot(aes(x = soilN, y = .epred, group = variety, color = variety)) +stat_lineribbon(.width =c(.66, .95, .99)) +scale_fill_grey() +labs(y ='posterior expected value of yield') +ggtitle('posterior expected value by soil N + variety', 'at average value of rainfall')
Conclusion
What did we learn? Let’s revisit the learning objectives!
Conceptual learning objectives
You now understand…
The basics of Bayesian inference
Definition of prior, likelihood, and posterior
How Markov Chain Monte Carlo works
What a credible interval is
Practical skills
You now can…
Write brms code to fit a multilevel model with random intercepts and random slopes
Diagnose and deal with convergence problems
Interpret brms output
Compare models with LOO information criteria
Use Bayes factors to assess strength of evidence for effects
Make plots of model parameters and predictions with credible intervals
Congratulations, you are now Bayesians!
See text version of lesson for further reading and useful resources
Please send any feedback to quentin.read@usda.gov!