Greta Sampling seems very slow

I just recently installed Greta using the instructions in this post: Installing Greta: Robust instructions as of March 31, 2019

I am running a simple poisson-gamma hierarchical model, and the sampler seems to be very slow compared to other MCMC samplers like JAGS and STAN for example. Is it possible I have something configured incorrectly that’s causing it to be slow?

Here’s my R code for this model:

## load required packages and set basic options
################################################################################

library("tidyverse"); theme_set(theme_minimal())
library("parallel"); options(mc.cores = detectCores())
library("greta")
library("bayesplot")
library("bench")


## generate/specify data
################################################################################

theta <- 5 # poisson theta

set.seed(1)

(y <- rpois(1, theta))
y <- as_data(y)



## specify greta model
################################################################################

theta <- gamma(3,1)
distribution(y) <- poisson(theta)

greta_model <- model(theta)

plot(greta_model)

## configure model settings
################################################################################

n_chains <- 4
n_iter <- 1e4L
n_warmup <- 1e3L


## fit model
################################################################################

  
  
  greta_fit <- mcmc(
    "model" = greta_model, "n_samples" = n_iter,
    "warmup" = n_warmup, "chains" = n_chains
  )
  
  
  
  ## assess fit
  ################################################################################
  
  summary(greta_fit)

The code looks fine to me.

For small models like this, I would expect JAGS and Stan to be much faster at sampling than greta. greta is designed and optimised for models with large vectorised and linear algebra operations, since it can parallelise each of those operations across multiple CPUs, or on GPUs. But there’s a computational overhead to each operation in greta, so it’ll never be as fast as a compiled C++ for loop - which is how JAGS and Stan work.

That said, it took about 30s to draw 10,000 samples using your greta code on my laptop, which in my experience with Stan is faster than it normally takes to compile the model code.

Just as another example of how greta works in a very different way to JAGS and Stan: If you were to use 400 chains of 100 iterations each (ie. the same number of posterior samples), this takes about half the time as 4 chains of 10,000 iterations. That’s even though it’s still doing 1000 warmup steps on each of those chains.

That’s because greta vectorises everything (even across chains) and splits those operations across CPUs, and can handle those vectorised operations more effectively. This makes a real difference in big models, and when you have access to more than a handful of CPU cores.