A feature we’ve been wanting for a long time is the ability to draw IID samples from a model prior (and ideally also simulate IID samples from distributions conditioned on MCMC samples of a model posterior). This would make it much easier to do prior predictive checks (do the priors lead to plausible parameters and datasets?) and posterior predictive checks (if we simulate data from the posteriors, does it look like the data we observed?) without having to write a bunch of custom code. There’s some work going on now into the internal changes that are needed to make this happen (here and here).
At this stage, it would be really helpful to get some feedback on what the user interface to this functionality would look like. I have a proposal below which I think covers these use cases and remains both straightforward and consistent with other simulate methods.
What do you think?
- Does this cover your use cases?
- Is it obvious how to use it?
- Is there any other related functionality that should be included in this or related functions?
Here’s some example documentation for the proposed interface; a simulate S3 method (from the generic stats::simulate()
) for objects of class ‘greta_model’:
simulate.greta_array (greta)
Simulate Responses From greta_model Object
Description
Simulate responses from a “greta_array” model object, either from the prior or conditioned on the posterior or some fixed values.
Usage
## S3 method for class 'greta_model'
simulate(object, nsim = 1, seed = NULL, targets = list(), values = list(), precision = c("double", "single"), ...)
Arguments
-
object
: agreta_model
object -
nsim
: positive integer scalar - the number of responses to simulate -
seed
: an optional seed to be used in set.seed immediately before the simulation so as to generate a reproducible sample -
targets
: an optional list of greta arrays for which to generate samples. If not specified, the greta arrays named in the call tomodel
will be used. -
values
: a named list giving fixed values of greta arrays in the model to hold constant, or anmcmc.list`` object returned by
mcmc`. -
precision
: the floating point precision to use when calculating values -
...
: optional additional arguments, none are used at present
Details
If the values
argument is not provided, simulate()
draws nsim
independent samples from the model prior for all of the target greta arrays - either those named in the call to model()
or those in targets
, if provided.
If values
are provided, samples of the target values will be independent, conditional on these provided values. If values
is a named list of values for some greta arrays, each of the nsim
independent samples of the target greta arrays will be drawn conditional on those fixed values. If an mcmc.list
object (created by fitting a greta model with mcmc
) is passed to values
, then nsim
samples of the target greta arrays will be returned, each of them conditioned on a different, randomly selected posterior sample from values.
Value
A named list of vectors, matrices or arrays containing independent samples of the named greta arrays. The number of samples will be prepended as the first dimension of the greta array, so that a vector fo samples is returned for each scalar greta array, and a matrix is returned for each vector greta array etc.
Examples
# build a greta model
n <- 10
y <- rnorm(n)
y <- as_data(y)
library(greta)
sd <- lognormal(1, 2)
mu <- normal(0, 1, dim = n)
distribution(y) <- normal(mu, sd)
m <- model(mu, sd)
# prior simulation:
# simulate one random draw of mu and sd from the model prior:
sims <- simulate(m)
# 100 simulations of mu and sd)
sims <- simulate(m, nsim = 100)
# 100 simulations of y and mu (greta arrays names in targets)
# NB: y must be a greta array here
sims <- simulate(m, nsim = 100, targets = list(y, mu))
# conditional prior simulation:
# 100 simulations of y and mu, with sd fixed at 3
sims <- simulate(m, nsim = 100, targets = list(y, mu), values = list(sd = 3))
# posterior simulation:
# generate posterior samples from the model as usual
draws <- mcmc(m)
# draw 100 simulations of y, conditioned on 100 random posterior
# samples of all the model parameters
sims <- simulate(m, nsim = 100, targets = list(y), values = draws)