Hi folks,

I’m interested in having dedicated simulation methods for models defined in greta. (At Nick’s suggestion I’m posting here to follow up on the thread in Issue https://github.com/greta-dev/greta/issues/269). I think this would be useful for instructional / explorational purposes, where it would be easy to simulate sample data of varying sizes from a specified model and then fit that model directly. (Of course it’s usually pretty simple to write a stochastic model directly in R, but it feels somewhat silly & possibly error-prone to have two separate definitions of the same model).

I’m particularly interested in time-series simulations, which can be fit from a vector model but require a `for`

loop syntax to simulate, but haven’t worked out an example of what that would look like in this case. (In the thread linked above, Nick also hints at some optimization that can be done for `for`

loops, which I’d love to learn more about.

Here’s the example I posed in the original thread:

e.g. let’s say I want to simulate logistic growth with some Gaussian random noise:

```
# priors
r <- uniform(0, 1)
K <- uniform(0, 10)
sigma <- uniform(0, 1)
# Model
mean <- x_t + r * x_t * (1 - x_t / K)
distribution(x_t1) <- normal(mean, sigma * x_t)
m <- model(x_t, x_t1) # ERROR, this is not valid syntax
```

(I put priors in there but would also like to simulate with fixed values of parameters; but in any event as discussed in #286, I think the trick is how to simulate `N`

samples of the data (the ordered pairs `x_t`

and `x_t1`

), not just drawing of the parameter values from the priors.

To make this concrete, here’s an example the same example in perfectly valid code in `nimble`

:

```
library(nimble)
logistic <- nimbleCode({
K ~ dunif(0,2)
r ~ dunif(0,1)
sigma ~ dunif(0,1)
x[1] <- x0
for(t in 1:(N-1)){
mu[t] <- x[t] + x[t] * r * (1 - x[t] / K)
y[t] <- x[t] * sigma
x[t+1] ~ dnorm(mu[t], sd = y[t])
}
})
constants <- list(N = 1e2, x0 = 0.2)
inits <- list(r = 0.05, K = 2, sigma = 0.02)
model <- nimbleModel(logistic, constants = constants, inits = inits)
set.seed(123)
# note we define the 'data' nodes that need to be simulated
simulate(model, nodes = c("mu", "x", "y"))
# extract simulated data & plot to see results
df <- tibble(t = seq_along(model$x), x = model$x)
df %>% ggplot(aes(t,x)) + geom_point()
```

Clearly the `nimble`

syntax is a bit more verbose, and of course the above loop could be written with `rnorm`

in pure R as well. At least the `nimble`

syntax it makes it clear that I’m simulating length-N vector `x`

(and ‘internal’ variables `y`

and `mu`

which are I think just needed as a limitation of the syntax). Thoughts on this welcome!

An implementation in tensorflow backend would be great, not sure how often the performance would be needed/better that way, but I think it would keep the internals more consistent that way? I’m also happy with a pure-R implementation but I think there’s still syntax things to be hammered out (or at least for me to wrap my head around!)