[Feedback requested!] proposed simulate() interface

Thanks Nick, this is great news. And no critical urgency – we’ve got a hack that does what we need, it’s just embarrassingly inelegant and woefully slow. @vlandau and I would be happy to help provide feedback and test whenever it makes sense. Feel free to tag us here or in the issues pages.

Quick update as I’m working on this now - PR here.

I’m planning to implement both a limited simulate(model, nsim = 100) to simulate from the whole prior model (IFF it is a complete generative model, so no variable()'s), and the interface above for simulating conditional on fixed values (a list or draws) via calculate(). So the following would work:

library(greta)
y <- as_data(y)
sd <- lognormal(1, 2)
mu <- normal(0, 1, dim = n)
distribution(y) <- normal(mu, sd)

# a named list giving draws from the model prior of the objects: mu, sd
calculate(mu, sd, nsim = 100)

# draws from the prior over the observed data y (which must be a greta
# array)
calculate(y, nsim = 100)

# draws from the prior over the observed data y, *conditional on a fixed
# value of mu*
calculate(y, values = list(mu = 3), nsim = 100)

m <- model(mu, sd)
draws <- mcmc(m)

#  draws from the *posterior* over the observed data y for a random
# subset of 100 posterior samples
calculate(y, values = draws, nsim = 100)

# a named list giving draws from the model prior of the objects: mu, sd, y
# (all of the greta arrays in this environment that are associated with
# the  model) this is a convenient wrapper for calculate(mu, sd, y,
# nsim = 100), which will be used internally
simulate(m, nsim = 100)

With this interface, it might be possible for people to be tripped up by simulation when they expect they are calculating something deterministic. E.g. if they were to do:

beta <- normal(0, 1, dim = 3)
eta <- X %*% beta
prob <- plogis(x)
calculate(prob)

not realising that prob is stochastic (depends on values for beta). I’m imagining this could happen when users are beginners, or have a more complicated model, where it’s not obvious that the target is going to be stochastic.

My proposed solution is to set a default of nsim = NULL in calculate(), in which case the above would error, with a message telling the user to provide either an nsim or values argument. I.e. nsim = NULL is equivalent to a ‘don’t simulate’ argument.

Alrighty, these features are now all implemented in the branch feature/simulate, which you can install with: remotes::install_github("greta-dev/greta@feature/simulate"). The example for calculate shows how to do all the new things: prior simulation, prior simulation conditional on some fixed values of parameters, and posterior simulation (simulating new data given posterior samples).

There are a bunch of tests in there, but it took a lot of thinking and hacking to implement (the same model/greta code implies very different DAGs for inference, prior simulation, and posterior simulation), so I’d really appreciate people trying it out and reporting bugs, so I know what else to test for.

Not all distributions can be sampled from at the moment, so the next step will be implementing sampling for more of those (e.g. mixture, joint, LKJ, Wishart), and some tests that the samples are correct.

One thing to bear in mind is that the interface to calculate has changed in a way that isn’t completely backwards-compatible. You can now pass in multiple greta arrays via a dots argument, but need to explicitly name the other arguments, like values.

OK, more tests implemented to make sure the samples are coming from the right distributions. Sampling is now implemented for every distribution with the exception of the hypergeometric distribution, and the truncated f distribution (both of which are tricky to sample from within tensorflow).

This is pretty much ready to be merged into master!

Merged into master now - thanks everyone who helped design the interface, and reported bugs!

The main dev branch isn’t going to be released imminently though, so there’s time for more feedback and bug reports.

Thanks Nick, the new simulate and calculate interfaces work super well, and should greatly simplify code we were using to develop posterior predictive checks and derived quantities. Looked like a giant lift based on the stream of commits – extremely impressive work!

This might be a bit if a niche question, but curious if you’ve got thoughts as to how you might simulate observations conditional on greta arrays created after MCMC. You’ve got an example in the documentation for calculate() that looks like:

# you can use calculate on greta arrays created even after the inference on
# the model - e.g. to plot response curves
petal_length_plot <- seq(min(iris$Petal.Length),
                         max(iris$Petal.Length),
                         length.out = 100)
mu_plot <- alpha + petal_length_plot * beta
mu_plot_draws <- calculate(mu_plot, values = draws)

Without falling back to base R’s rnorm, is there an easy way in greta to make draws for a new y on the basis of the vector of means mu_plot (and of course the other parameters in the model)? The reason we’re interested is that we’d like to create posterior predictive checks for observations withheld from the model.

The reason it is tricky seems to involve how the greta array for the responses (y <- as_data(iris$Petal.Width)) is used. All of the observations that fed into distribution(y) are used in estimation, so there’s no clear way to use greta to simulate new observations for a separate validation or testing partition of iris…? I suspect I’m articulating all of this very poorly, so if I’m not making sense please let me know. I could try to share a more complete example.

Hi Luke,

Great, I’m really glad it’s working for you!

Shooting from the hip here, but is this what you want?:

y_test <- normal(mu_plot, sd)
calculate (y_test, values = draws, nsim = 1000)

If not, please do share an example and I’ll take a closer look!

Hell of a shot! :man_facepalming: Feel bad I bothered you with this. Those two lines are exactly what I needed!

1 Like

Ha! Great, glad that helped.

Just started testing calculate() for prior and posterior predictive checks. Its AMAZING!! And fast. Thanks for this feature.

Great, I’m glad it’s helpful. I’m pretty pleased with it!