Re. 3 efficiently implementing a timeseries model in greta
As mentioned in the Github issue (and a couple of times in other parts of the forum/GitHub I think), greta is not well suited to this type of timeseries model, because it’s designed to work best with models that have large, vectorised operations. It’s terrible at for loops. However these models are popular, so here’s a naive implementation and some speedups.
naive implementation
First, here’s a naive implementation of your nimble model in greta. It basically uses the exact same for loop code as in nimble. It’s a valid model, but I would not recommend trying to run this, it takes forever to build, it will be hard to find starting values, and would be very slow to sample.
N <- 1e2
x0 <- 0.2
# priors
r <- uniform(0, 1)
K <- uniform(0, 10)
sigma <- uniform(0, 1)
y <- x <- mu <- zeros(N)
x[1] <- x0
for (t in 1:(N-1)) {
mu[t] <- x[t] + x[t] * r * (1 - x[t] / K)
y[t] <- x[t] * sigma
x[t + 1] <- normal(mu[t], sd = y[t])
}
m <- model(x)
draws <- mcmc(m)
Here the timeseries values aren’t known, so we need to generate each one at a time. This would be the same in a model fitted to these data if you don’t directly observe the state of the timeseries (values of x
), but instead observe them with some error. If you observe the x
directly you don’t need the for loop, you can just define the observation distributions on a single vector.
speedup one, avoid extract/replace syntax
greta’s extract and replace syntax (a <- x[1]
and x[1] <- a
) is written in a really general way, so that it works exactly the same way as R’s. But it doesn’t correspond to the most efficient way to extract/replace in tensorflow. Each of those is also a separate operation that’s created in both the greta and tensorflow graphs. We can avoid some of that by keeping them as R lists of greta arrays, and then recombining these lists at the end. It’s not a nice interface, but I should be able to speed up some of these operations in greta in future releases.
y_list <- x_list <- mu_list <- list()
x_list[[1]] <- x0
for (t in 1:(N-1)) {
mu_list[[t]] <- x_list[[t]] + x_list[[t]] * r * (1 - x_list[[t]] / K)
y_list[[t]] <- x_list[[t]] * sigma
x_list[[t + 1]] <- normal(mu_list[[t]], sd = y_list[[t]])
}
# stitch them back together
x <- do.call(rbind, x_list)
mu <- do.call(rbind, mu_list)
y <- do.call(rbind, y_list)
speedup two, de-centred parameterisation
This will speed up the sampler and make it much easier to find decent initial values. It’s exactly the same model, but the sampler sees it from a different perspective. There’s lots of discussionof decentred parameterisations in the stan docs and forum, and various blog posts (e.g. here). This is the same thing, although in a timeseries setting.
Note that if we were just drawing IID samples from the priors, this wouldn’t be an issue. It’s only because we’re sampling from the priors by MCMC.
for (t in 1:(N-1)) {
mu_list[[t]] <- x_list[[t]] + x_list[[t]] * r * (1 - x_list[[t]] / K)
y_list[[t]] <- x_list[[t]] * sigma
deviate <- normal(0, 1)
x_list[[t + 1]] <- mu_list[[t]] + deviate * y_list[[t]]
}
Doing MCMC on this works pretty well. Though the values of x
for t > 1 have some pretty extreme outliers because of the vague priors, but that’s expected.
experimental possible speedups three to five, vectorisation!
You can sometimes speedup time series inference in greta by recasting the timeseries in some form that’s vectorised. E.g. random walk of order 1 (RW1) models are simple to define with a cumulative sum:
N <- 10
sigma <- uniform(0, 1)
deviates <- normal(0, 1, dim = N)
x <- cumsum(deviates * sigma)
RW2 models are a double cumsum()
:
N <- 10
sigma <- uniform(0, 1)
deviates <- normal(0, 1, dim = N)
x <- cumsum(cumsum(deviates * sigma))
Autoregressive models like the logistic growth function are much tricker, since the step size and direction (as well as the starting point) at each time depends on the previous state. A slightly simpler model that I’ve played with a a bit is the autoregressive model of order 1, which looks like:
x_{t+1} = \rho x_t + \epsilon_t
I’ve tried a couple fo approaches to speeding up inference for that model in greta. These are all pretty experimental, and I wouldn’t expect users to be able to code these up themselves. But they could form the basis of an extension package to make this stuff easier, and maybe you could adapt these approaches to the logistic growth model?
Gaussian process representation
If there’s a known analytic solution to the covariance of the process, you can use it to define the kernel of a Gaussian process (ie. the covariance matrix of a multivariate normal). This is in general a less efficient solution in that it requires more floating point operations and scales badly with the number of timesteps. But it relies on large matrix operations that greta is good at solving and can parallelise, so it’s not actually a terrible idea in some cases. Here’s the AR1 process, which I think has covariance: \sigma^2 \frac{\rho^{|t_1 - t_2|}}{1 - \rho^2} though I may have my \rho's mixed up. It’s something like that, it corresponds to a Gaussian process with exponential kernel.
ar1_cov <- function (n_time, rho, sigma = 1) {
time_diff <- abs(outer(1:n_time, 1:n_time, `-`))
sigma ^ 2 * (rho ^ time_diff) / (1 - rho ^ 2)
}
N <- 10
sigma <- uniform(0, 1)
rho <- uniform(0.5, 0.8)
K <- ar1_cov(N, rho, sigma)
# x <- multivariate_normal(zeros(1, N), K)
# this is the whitened decomposition, the multivariate versioning of hierarchical decentering
L <- t(chol(K))
innovations <- normal(0, 1, dim = N)
x <- L %*% innovations
Whether you can work out the covariance structure for your logistic growth time series I’m not sure, but this isn’t the most efficient route anyway. So moving on…
matrix algebra approach
You can expand the iterative equation for the AR1 process to a sum, so the AR1 becomes:
x_t = \sum_{i=0}^{t} \epsilon_{t - i} \rho ^ i
and then build a triangular matrix so that these sums can be solved with a matrix multiplication:
# matrix of time modifiers
t_seq <- seq_len(N)
t_mat <- outer(t_seq, t_seq, FUN = "-")
t_mat <- pmax(t_mat, 0)
# which elements to include (don't want upper triangular ones)
mask <- lower.tri(t_mat, diag = TRUE)
# matrix of rho ^ n contributions
rho_mat <- (rho ^ t_mat) * mask
innovations <- normal(0, 1, dim = N) * sigma
x <- rho_mat %*% innovations_scaled
Again, this isn’t as theoretically efficient as it could be with a loop, but it’s vectorised so you avoid the overhead caused by the for loop.
code it up as a new greta op, with tensorflow code
Here’s a greta op that takes in a vector of \epsilon_t and a \rho and returns the solution iteratively. It uses tensorflow’s (frankly, horrifying) while_loop
construct. the body
function is where the process is coded up.
ar1ify <- function (innovations, rho) {
op <- greta::.internals$nodes$constructors$op
op("ar1ify", innovations, rho,
tf_operation = "tf_ar1ify")
}
# use a tf while loop to iterate through timesteps, forward-simulating an AR(1)
# process
tf_ar1ify <- function (innovations, rho) {
tf <- tensorflow::tf
shape <- tensorflow::shape
body <- function(epsilon, innovations, rho, iter, maxiter) {
# get next row of epsilon
innovation <- innovations[, iter, , drop = FALSE]
previous <- epsilon[, iter - 1L, , drop = FALSE]
# need to pad both (?!)
innovation <- tf$expand_dims(innovation, 1L)
previous <- tf$expand_dims(previous, 1L)
# get nex value of epsilon
new_eps <- rho * previous + innovation
# append to epsilon
epsilon <- tf$concat(list(epsilon, new_eps), axis = 1L)
list(epsilon, innovations, rho, iter + 1L, maxiter)
}
cond <- function(epsilon, innovations, rho, iter, maxiter) {
tf$less(iter, maxiter)
}
dim <- dim(innovations)
n_row <- dim[[2]]
n_col <- dim[[3]]
shapes <- list(tf$TensorShape(shape(NULL, NULL, n_col)),
tf$TensorShape(shape(NULL, n_row, n_col)),
tf$TensorShape(shape(NULL, 1L, 1L)),
tf$TensorShape(shape()),
tf$TensorShape(shape()))
# create an epsilon with the first row of innovations
epsilon <- innovations[, 0, , drop = FALSE]
values <- list(epsilon,
innovations,
rho,
tf$constant(1L),
tf$constant(n_row))
out <- tf$while_loop(cond,
body,
values,
shape_invariants = shapes)
out[[1]]
}
You can use this like:
N <- 10
sigma <- uniform(0, 1)
rho <- uniform(0.5, 0.8)
innovations <- normal(0, 1, dim = N) * sigma
x <- ar1ify(innovations, rho)
m <- model(x)
draws <- mcmc(m)
The last of these approaches is the most theoretically efficient, in that it uses the minimum number of floating point operations. Though there might be more overhead, and less capacity to use multiple cores, so YMMV.
You might be able to adapt the body
bit of this to code up your logistic growth model.
At some point, I’d like (someone else) to write an extension package that provides time series functionality like this to users without them having to think.
It may also be possible to write a generic iterator that takes a custom body function written in R/greta code, and do all this plumbing automagically. It’s not trivial though. Like I say, greta’ wasn’t really designed with this type of model in mind!