Want to skip to the slides and sample code? Click here.
In early 2023, I found myself at a crossroads in my statistical practice. After numerous late nights struggling with convergence errors in frequentist generalized mixed models, I decided there had to be a better way. This frustration led me down a path of discovery into Bayesian statistics - a journey that not only solved my immediate problems but fundamentally changed how I approach data analysis.
After some nights and weekends of independent study and practical implementation, I had the opportunity to share what I'd learned with my lab colleagues in a workshop focused on principled Bayesian workflow. This post details that journey and the structured workflow I discovered, which provides a robust framework for approaching statistical problems from a Bayesian perspective.
Traditional frequentist statistics, while powerful, comes with limitations that can be particularly challenging when dealing with complex models. My frustrations primarily stemmed from:
Convergence issues with complex hierarchical models
Difficulty incorporating prior knowledge into analyses
The unintuitive nature of p-values and confidence intervals
Limited ability to address uncertainty in a principled way
Bayesian methods offered solutions to these challenges by:
Providing a more robust framework for complex model fitting
Allowing explicit incorporation of prior knowledge
Delivering intuitive probability statements about parameters
Offering a natural way to quantify and propagate uncertainty
As Andrew Gelman and colleagues explain, "The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory" (Gelman et al., 2020) ArXiv.
At the heart of Bayesian methods is Bayes' theorem, which provides a mathematical framework for updating beliefs based on new evidence:
Where:
is the posterior probability (our updated belief about parameters after seeing the data)
is the likelihood (how probable is our observed data given the parameters)
is the posterior probability (our updated belief about parameters after seeing the data)
is the marginal likelihood (a normalizing constant)
The beauty of this approach is that it gives us a formal way to combine prior knowledge
with observed data to reach conclusions expressed as probability distributions.
During my studies, I discovered the concept of a "principled Bayesian workflow" - a structured approach to developing, checking, and using Bayesian models that goes beyond simply fitting models and interpreting results.
This workflow, advocated by experts like Michael Betancourt, Richard McElreath, and Paul Bürkner, provides a comprehensive framework for Bayesian analysis that ensures robust and reliable results (Betancourt, 2018) Betanalpha.
The workflow includes the following key steps:
The first step involves carefully considering the data generating process and specifying an appropriate model structure. This includes:
Understanding the relevant variables and their relationships
Creating visualizations to explore data patterns
Considering causal relationships (potentially using directed acyclic graphs)
Deciding on standardization of variables
This step is crucial because, as Michael Betancourt notes, "A principled workflow considers whether or not modeling assumptions are appropriate and sufficient for answering relevant questions in your particular applied context" (Betancourt, 2018) Betanalpha.
With a model structure in mind, we proceed to implementation. For my workshop, I demonstrated using the brms package in R, which provides a powerful and accessible interface to Stan, a state-of-the-art probabilistic programming language.
A simple example from my demonstration used the mtcars dataset:
# Initial model with default priors
mdl <- brm(mpg ~ wt * am, data = car.data)
This simple model examines how a car's miles per gallon (mpg) is affected by its weight (wt) and transmission type (am), including their interaction.
Once we fit the model, it's essential to check that the Markov Chain Monte Carlo (MCMC) sampling has worked correctly. Key diagnostics include:
Trace plots (looking for "fat hairy caterpillars")
R-hat values (should be very close to 1)
Effective sample size (higher is better)
Divergent transitions (few or none)
Proper MCMC convergence is critical because without it, our posterior distributions may be unreliable.
One of the most important aspects of the principled Bayesian workflow is examining the implications of our priors before looking at the data. As Schad, Betancourt, and Vasishth explain:
"Prior predictive checks allow us to verify that our model is consistent with our domain expertise" (Schad et al., 2019) ArXiv.
In my demonstration, I showed how to perform prior predictive checks by simulating from the prior distributions:
This revealed that the wide, uninformative priors can lead to very unrealistic predictions prior to the model seeing any data. I then demonstrated how to specify more informative priors and its impact on prior predictive checks.
After fitting the model with appropriate priors, we check whether the model captures the structure of the observed data:
This helps us identify potential misfit between our model and the data. If there's substantial misfit, we may need to revise our model structure.
To ensure our results aren't unduly influenced by our prior choices, we can conduct sensitivity analyses:
This helps identify whether our conclusions are driven primarily by the data (which is usually desirable) or by our prior specifications.
Based on the results of our checks, we may need to refine our model. In my demonstration, I showed how to improve the model by adding constraints based on domain knowledge. In my simple case, a car's miles per gallon cannot be negative so I employed lower-bound truncation in the model specification. One could also use a likelihood that only permits positive values, such as the lognormal or Gamma distributions.
Finally, once we have a well-specified model that passes our checks, we can draw inferences from our posterior distributions. This might include:
Examining posterior distributions of parameters
Calculating derived quantities of interest
Testing specific hypotheses using probability statements
Making predictions for new data
In my demonstration, I showed how to use the emmeans package for inference:
Throughout the workshop, I highlighted several key advantages of the Bayesian approach:
Model flexibility: Bayesian methods can handle a wider range of model structures, including those that would cause convergence problems in frequentist frameworks.
Intuitive interpretation: Posterior distributions provide direct probability statements about parameters, which are more intuitive than p-values and confidence intervals.
Prior incorporation: The ability to incorporate prior knowledge can improve model stability and provide more realistic results.
Uncertainty propagation: Bayesian methods naturally propagate uncertainty through all levels of analysis.
Principled workflow: The structured workflow provides a comprehensive framework for model building, checking, and validation.
To make the concepts concrete, I shared examples from my own research where Bayesian methods helped overcome limitations of frequentist approaches:
Complex hierarchical models with multiple nested random effects that previously failed to converge
Models incorporating informative priors based on previous studies
Analysis of skewed data and binary outcomes using appropriate likelihood functions
My journey from frequentist frustration to Bayesian enlightenment has transformed how I approach statistical analysis.
The principled Bayesian workflow provides a comprehensive framework that ensures robust and reliable results while encouraging careful thinking about models and data.
While Bayesian methods do come with some challenges - particularly the increased computational demands and the need to specify priors - the benefits far outweigh these costs for many applications.
I encourage anyone struggling with the limitations of traditional statistical approaches to explore Bayesian methods and the principled workflow described here. As Richard McElreath puts it in his influential book "Statistical Rethinking," Bayesian methods aren't just an alternative estimation strategy - they represent a fundamentally different way of thinking about inference and uncertainty.
If you're interested in learning more, I highly recommend the resources listed below, which were instrumental in my own journey.
Books:
"Statistical Rethinking" by Richard McElreath
"Bayesian Data Analysis" by Andrew Gelman et al.
"Doing Bayesian Data Analysis" by John Kruschke
Papers:
"Toward a principled Bayesian workflow in cognitive science" by Schad, Betancourt, and Vasishth
"Visualization in Bayesian workflow" by Gabry, Simpson, Vehtari, Betancourt, and Gelman
"Bayesian Workflow" by Gelman et al.
Software:
Stan: https://mc-stan.org/
bayestestR: https://github.com/easystats/bayestestR
tidybayes: https://github.com/mjskay/tidybayes
Online Resources:
Michael Betancourt's case studies: https://betanalpha.github.io/writing/
Richard McElreath's Statistical Rethinking lectures on YouTube
Stan forums: https://discourse.mc-stan.org/
Learning Bayesian Statistics Podcast: https://learnbayesstats.com/
Forthcoming book on Bayesian Workflow by Gelman et al.: https://sites.stat.columbia.edu/gelman/workflow-book/