A feature we’ve been wanting for a long time is the ability to draw IID samples from a model prior (and ideally also simulate IID samples from distributions conditioned on MCMC samples of a model posterior). This would make it much easier to do prior predictive checks (do the priors lead to plausible parameters and datasets?) and posterior predictive checks (if we simulate data from the posteriors, does it look like the data we observed?) without having to write a bunch of custom code. There’s some work going on now into the internal changes that are needed to make this happen (here and here).
At this stage, it would be really helpful to get some feedback on what the user interface to this functionality would look like. I have a proposal below which I think covers these use cases and remains both straightforward and consistent with other simulate methods.
What do you think?
- Does this cover your use cases?
- Is it obvious how to use it?
- Is there any other related functionality that should be included in this or related functions?
Here’s some example documentation for the proposed interface; a simulate S3 method (from the generic stats::simulate()) for objects of class ‘greta_model’:
simulate.greta_array (greta)
Simulate Responses From greta_model Object
Description
Simulate responses from a “greta_array” model object, either from the prior or conditioned on the posterior or some fixed values.
Usage
## S3 method for class 'greta_model'
simulate(object, nsim = 1, seed = NULL, targets = list(), values = list(), precision = c("double", "single"), ...)
Arguments
-
object: agreta_modelobject -
nsim: positive integer scalar - the number of responses to simulate -
seed: an optional seed to be used in set.seed immediately before the simulation so as to generate a reproducible sample -
targets: an optional list of greta arrays for which to generate samples. If not specified, the greta arrays named in the call tomodelwill be used. -
values: a named list giving fixed values of greta arrays in the model to hold constant, or anmcmc.list`` object returned bymcmc`. -
precision: the floating point precision to use when calculating values -
...: optional additional arguments, none are used at present
Details
If the values argument is not provided, simulate() draws nsim independent samples from the model prior for all of the target greta arrays - either those named in the call to model() or those in targets, if provided.
If values are provided, samples of the target values will be independent, conditional on these provided values. If values is a named list of values for some greta arrays, each of the nsim independent samples of the target greta arrays will be drawn conditional on those fixed values. If an mcmc.list object (created by fitting a greta model with mcmc) is passed to values, then nsim samples of the target greta arrays will be returned, each of them conditioned on a different, randomly selected posterior sample from values.
Value
A named list of vectors, matrices or arrays containing independent samples of the named greta arrays. The number of samples will be prepended as the first dimension of the greta array, so that a vector fo samples is returned for each scalar greta array, and a matrix is returned for each vector greta array etc.
Examples
# build a greta model
n <- 10
y <- rnorm(n)
y <- as_data(y)
library(greta)
sd <- lognormal(1, 2)
mu <- normal(0, 1, dim = n)
distribution(y) <- normal(mu, sd)
m <- model(mu, sd)
# prior simulation:
# simulate one random draw of mu and sd from the model prior:
sims <- simulate(m)
# 100 simulations of mu and sd)
sims <- simulate(m, nsim = 100)
# 100 simulations of y and mu (greta arrays names in targets)
# NB: y must be a greta array here
sims <- simulate(m, nsim = 100, targets = list(y, mu))
# conditional prior simulation:
# 100 simulations of y and mu, with sd fixed at 3
sims <- simulate(m, nsim = 100, targets = list(y, mu), values = list(sd = 3))
# posterior simulation:
# generate posterior samples from the model as usual
draws <- mcmc(m)
# draw 100 simulations of y, conditioned on 100 random posterior
# samples of all the model parameters
sims <- simulate(m, nsim = 100, targets = list(y), values = draws)