Simulations with greta? (including time-series)

cboettig · July 23, 2019, 1:56am

Hi folks,

I’m interested in having dedicated simulation methods for models defined in greta. (At Nick’s suggestion I’m posting here to follow up on the thread in Issue https://github.com/greta-dev/greta/issues/269). I think this would be useful for instructional / explorational purposes, where it would be easy to simulate sample data of varying sizes from a specified model and then fit that model directly. (Of course it’s usually pretty simple to write a stochastic model directly in R, but it feels somewhat silly & possibly error-prone to have two separate definitions of the same model).

I’m particularly interested in time-series simulations, which can be fit from a vector model but require a for loop syntax to simulate, but haven’t worked out an example of what that would look like in this case. (In the thread linked above, Nick also hints at some optimization that can be done for for loops, which I’d love to learn more about.

Here’s the example I posed in the original thread:

e.g. let’s say I want to simulate logistic growth with some Gaussian random noise:

# priors
r <- uniform(0, 1)
K <- uniform(0, 10)
sigma <- uniform(0, 1)
# Model
mean <- x_t + r * x_t * (1 - x_t / K) 
distribution(x_t1) <- normal(mean, sigma * x_t)
m <- model(x_t, x_t1) # ERROR, this is not valid syntax

(I put priors in there but would also like to simulate with fixed values of parameters; but in any event as discussed in #286, I think the trick is how to simulate N samples of the data (the ordered pairs x_t and x_t1), not just drawing of the parameter values from the priors.

To make this concrete, here’s an example the same example in perfectly valid code in nimble:

library(nimble)

logistic  <- nimbleCode({
  K ~ dunif(0,2)
  r ~ dunif(0,1)
  sigma ~ dunif(0,1)
  x[1] <- x0
  for(t in 1:(N-1)){
    mu[t] <- x[t] + x[t] * r * (1 - x[t] / K) 
    y[t] <- x[t] * sigma
    x[t+1] ~ dnorm(mu[t], sd = y[t])
  }
  
})


constants <- list(N = 1e2, x0 = 0.2)
inits <- list(r = 0.05, K = 2, sigma = 0.02)

model <- nimbleModel(logistic, constants = constants, inits = inits)

set.seed(123)
# note we define the 'data' nodes that need to be simulated 
simulate(model, nodes = c("mu", "x", "y"))

# extract simulated data & plot to see results
df <- tibble(t = seq_along(model$x), x = model$x)
df %>% ggplot(aes(t,x)) + geom_point()

Clearly the nimble syntax is a bit more verbose, and of course the above loop could be written with rnorm in pure R as well. At least the nimble syntax it makes it clear that I’m simulating length-N vector x (and ‘internal’ variables y and mu which are I think just needed as a limitation of the syntax). Thoughts on this welcome!

An implementation in tensorflow backend would be great, not sure how often the performance would be needed/better that way, but I think it would keep the internals more consistent that way? I’m also happy with a pure-R implementation but I think there’s still syntax things to be hammered out (or at least for me to wrap my head around!)

nick · July 23, 2019, 3:45am

Thanks for reposting here Carl!

There are a few different things going on here:

adding a simulate function for greta
simulating in the existing greta using MCMC
efficiently implementing a timeseries model in greta

So I’ll talk about each separately in different posts below

nick · July 23, 2019, 3:46am

Re 1. adding a simulate function to greta

This is something I’d love to have, and have a road map for how to implement it (but no time right now unfortunately). You could write out your model like this, including some observed data:

y <- as_data(rnorm(10))
mu <- normal(0, 10)
distribution(y) <- normal(mu, 1)

then draw iid samples from the priors (ie. not conditioning on the data) to simulate new data, with an interface similar to the existing calculate() like this:

y_sim <- simulate(y)

or

mu_sim <- simulate(mu)

which would return R vectors (of length 10 in these cases) of values of mu and y simulated from the priors.

You could also simulate the values of observed data, given posterior samples for the model, with an interface like this:

y_post_sim <- simulate(y, values = draws)

where draws is an mcmc.list object returned from doing MCMC on this model. This would return another mcmc.list object with posterior simulations of y (Ie. taking the posterior samples of mu from MCMC and simulating a new y for each one).

This functionality would be pretty useful for prior- and posterior-predictive checks, and for simulation-based teaching. GitHub issue 269 is about implementing something like this.

nick · July 23, 2019, 3:47am

Re 2. simulating in the existing greta using MCMC

You can already get pretty close to doing this using the existing version of greta, but rather than taking IID samples from the priors, you need to sample using MCMC. MCMC is a general purpose method for sampling from probability distributions, so has no problem sampling from priors, but it’s way more computationally costly and only returns correlated samples, not independent ones. But here’s how you would do it using the example above:

mu <- normal(0, 10)
y <- normal(mu, 1)

m <- model(mu, y)
draws <- mcmc(m)

There’s no distribution() statement here, so this is just sampling from the priors of the model. It has to be a different model to the one used for fitting to data though, since we don’t want to condition on the data,

You can sample y using the posterior for mu manually, by first fitting the model:

y <- as_data(rnorm(10))
mu <- normal(0, 10)
distribution(y) <- normal(mu, 1)
m <- model(mu)
mu_draws <- mcmc(m)

then simulating ys

mu_mat <- as.matrix(mu_draws)
y_mat <- array(0, dim(mu_mat))
y_mat[] <- rnorm(length(mu_mat), mu_mat, 1)

or you can automatically simulate y using the posterior for mu by adding another vector of unknowns to the model:

y <- rnorm(10)
mu <- normal(0, 10)
distribution(y) <- normal(mu, 1)
y2 <- normal(mu, 1, dim )
m <- model(y2)
y2_draws <- mcmc(m)

this second option (currently) only works with continuous distributions, because greta can’t sample discrete variables yet. But you can use this for posterior predictive checks.

For various reasons a simulate() function would be better than this, though hopefully this is useful to demonstrate what you can do!

nick · July 23, 2019, 6:12am

Re. 3 efficiently implementing a timeseries model in greta

As mentioned in the Github issue (and a couple of times in other parts of the forum/GitHub I think), greta is not well suited to this type of timeseries model, because it’s designed to work best with models that have large, vectorised operations. It’s terrible at for loops. However these models are popular, so here’s a naive implementation and some speedups.

naive implementation

First, here’s a naive implementation of your nimble model in greta. It basically uses the exact same for loop code as in nimble. It’s a valid model, but I would not recommend trying to run this, it takes forever to build, it will be hard to find starting values, and would be very slow to sample.

N <- 1e2
x0 <- 0.2

# priors
r <- uniform(0, 1)
K <- uniform(0, 10)
sigma <- uniform(0, 1)

y <- x <- mu <- zeros(N)
x[1] <- x0

for (t in 1:(N-1)) {
  mu[t] <- x[t] + x[t] * r * (1 - x[t] / K) 
  y[t] <- x[t] * sigma
  x[t + 1] <- normal(mu[t], sd = y[t])
}

m <- model(x)
draws <- mcmc(m)

Here the timeseries values aren’t known, so we need to generate each one at a time. This would be the same in a model fitted to these data if you don’t directly observe the state of the timeseries (values of x), but instead observe them with some error. If you observe the x directly you don’t need the for loop, you can just define the observation distributions on a single vector.

speedup one, avoid extract/replace syntax

greta’s extract and replace syntax (a <- x[1] and x[1] <- a) is written in a really general way, so that it works exactly the same way as R’s. But it doesn’t correspond to the most efficient way to extract/replace in tensorflow. Each of those is also a separate operation that’s created in both the greta and tensorflow graphs. We can avoid some of that by keeping them as R lists of greta arrays, and then recombining these lists at the end. It’s not a nice interface, but I should be able to speed up some of these operations in greta in future releases.

y_list <- x_list <- mu_list <- list()
x_list[[1]] <- x0

for (t in 1:(N-1)) {
  mu_list[[t]] <- x_list[[t]] + x_list[[t]] * r * (1 - x_list[[t]] / K) 
  y_list[[t]] <- x_list[[t]] * sigma
  x_list[[t + 1]] <- normal(mu_list[[t]], sd = y_list[[t]])
}

# stitch them back together
x <- do.call(rbind, x_list)
mu <- do.call(rbind, mu_list)
y <- do.call(rbind, y_list)

speedup two, de-centred parameterisation

This will speed up the sampler and make it much easier to find decent initial values. It’s exactly the same model, but the sampler sees it from a different perspective. There’s lots of discussionof decentred parameterisations in the stan docs and forum, and various blog posts (e.g. here). This is the same thing, although in a timeseries setting.

Note that if we were just drawing IID samples from the priors, this wouldn’t be an issue. It’s only because we’re sampling from the priors by MCMC.

for (t in 1:(N-1)) {
  mu_list[[t]] <- x_list[[t]] + x_list[[t]] * r * (1 - x_list[[t]] / K) 
  y_list[[t]] <- x_list[[t]] * sigma
  deviate <- normal(0, 1)
  x_list[[t + 1]] <- mu_list[[t]] + deviate * y_list[[t]]
}

Doing MCMC on this works pretty well. Though the values of x for t > 1 have some pretty extreme outliers because of the vague priors, but that’s expected.

experimental possible speedups three to five, vectorisation!

You can sometimes speedup time series inference in greta by recasting the timeseries in some form that’s vectorised. E.g. random walk of order 1 (RW1) models are simple to define with a cumulative sum:

N <- 10
sigma <- uniform(0, 1)
deviates <- normal(0, 1, dim = N)
x <- cumsum(deviates * sigma)

RW2 models are a double cumsum():

N <- 10
sigma <- uniform(0, 1)
deviates <- normal(0, 1, dim = N)
x <- cumsum(cumsum(deviates * sigma))

Autoregressive models like the logistic growth function are much tricker, since the step size and direction (as well as the starting point) at each time depends on the previous state. A slightly simpler model that I’ve played with a a bit is the autoregressive model of order 1, which looks like:
x_{t+1} = \rho x_t + \epsilon_t

I’ve tried a couple fo approaches to speeding up inference for that model in greta. These are all pretty experimental, and I wouldn’t expect users to be able to code these up themselves. But they could form the basis of an extension package to make this stuff easier, and maybe you could adapt these approaches to the logistic growth model?

Gaussian process representation

If there’s a known analytic solution to the covariance of the process, you can use it to define the kernel of a Gaussian process (ie. the covariance matrix of a multivariate normal). This is in general a less efficient solution in that it requires more floating point operations and scales badly with the number of timesteps. But it relies on large matrix operations that greta is good at solving and can parallelise, so it’s not actually a terrible idea in some cases. Here’s the AR1 process, which I think has covariance: \sigma^2 \frac{\rho^{|t_1 - t_2|}}{1 - \rho^2} though I may have my \rho's mixed up. It’s something like that, it corresponds to a Gaussian process with exponential kernel.

ar1_cov <- function (n_time, rho, sigma = 1) {
  time_diff <- abs(outer(1:n_time, 1:n_time, `-`))
  sigma ^ 2 * (rho ^ time_diff) / (1 - rho ^ 2)
}

N <- 10
sigma <- uniform(0, 1)
rho <- uniform(0.5, 0.8)

K <- ar1_cov(N, rho, sigma)

# x <- multivariate_normal(zeros(1, N), K)

# this is the whitened decomposition, the multivariate versioning of hierarchical decentering
L <- t(chol(K))
innovations <- normal(0, 1, dim = N)
x <- L %*% innovations

Whether you can work out the covariance structure for your logistic growth time series I’m not sure, but this isn’t the most efficient route anyway. So moving on…

matrix algebra approach

You can expand the iterative equation for the AR1 process to a sum, so the AR1 becomes:
x_t = \sum_{i=0}^{t} \epsilon_{t - i} \rho ^ i
and then build a triangular matrix so that these sums can be solved with a matrix multiplication:

# matrix of time modifiers
t_seq <- seq_len(N)
t_mat <- outer(t_seq, t_seq, FUN = "-")
t_mat <- pmax(t_mat, 0)

# which elements to include (don't want upper triangular ones)
mask <- lower.tri(t_mat, diag = TRUE)

# matrix of rho ^ n contributions
rho_mat <- (rho ^ t_mat) * mask

innovations <- normal(0, 1, dim = N) * sigma
x <- rho_mat %*% innovations_scaled

Again, this isn’t as theoretically efficient as it could be with a loop, but it’s vectorised so you avoid the overhead caused by the for loop.

code it up as a new greta op, with tensorflow code

Here’s a greta op that takes in a vector of \epsilon_t and a \rho and returns the solution iteratively. It uses tensorflow’s (frankly, horrifying) while_loop construct. the body function is where the process is coded up.

ar1ify <- function (innovations, rho) {
  op <- greta::.internals$nodes$constructors$op
  op("ar1ify", innovations, rho,
     tf_operation = "tf_ar1ify")
}

# use a tf while loop to iterate through timesteps, forward-simulating an AR(1)
# process
tf_ar1ify <- function (innovations, rho) {
  
  tf <- tensorflow::tf
  shape <- tensorflow::shape
  
  body <- function(epsilon, innovations, rho, iter, maxiter) {
    
    # get next row of epsilon
    innovation <- innovations[, iter, , drop = FALSE]
    previous <- epsilon[, iter - 1L, , drop = FALSE]
    # need to pad both (?!)
    innovation <- tf$expand_dims(innovation, 1L)
    previous <- tf$expand_dims(previous, 1L)
    
    # get nex value of epsilon
    new_eps <- rho * previous + innovation
    
    # append to epsilon
    epsilon <- tf$concat(list(epsilon, new_eps), axis = 1L)
    
    list(epsilon, innovations, rho, iter + 1L, maxiter)
  }
  
  cond <- function(epsilon, innovations, rho, iter, maxiter) {
    tf$less(iter, maxiter)
  }
  
  dim <- dim(innovations)
  n_row <- dim[[2]]
  n_col <- dim[[3]]
  
  shapes <- list(tf$TensorShape(shape(NULL, NULL, n_col)),
                 tf$TensorShape(shape(NULL, n_row, n_col)),
                 tf$TensorShape(shape(NULL, 1L, 1L)),
                 tf$TensorShape(shape()),
                 tf$TensorShape(shape()))
  
  # create an epsilon with the first row of innovations
  epsilon <- innovations[, 0, , drop = FALSE]
  
  values <- list(epsilon,
                 innovations,
                 rho,
                 tf$constant(1L),
                 tf$constant(n_row))
  
  out <- tf$while_loop(cond,
                       body,
                       values,
                       shape_invariants = shapes)
  out[[1]]
  
}

You can use this like:

N <- 10
sigma <- uniform(0, 1)
rho <- uniform(0.5, 0.8)
innovations <- normal(0, 1, dim = N) * sigma
x <- ar1ify(innovations, rho)
m <- model(x)
draws <- mcmc(m)

The last of these approaches is the most theoretically efficient, in that it uses the minimum number of floating point operations. Though there might be more overhead, and less capacity to use multiple cores, so YMMV.

You might be able to adapt the body bit of this to code up your logistic growth model.

At some point, I’d like (someone else) to write an extension package that provides time series functionality like this to users without them having to think.

It may also be possible to write a generic iterator that takes a custom body function written in R/greta code, and do all this plumbing automagically. It’s not trivial though. Like I say, greta’ wasn’t really designed with this type of model in mind!