Rounding error Stan model in greta

Hi everyone, I’m looking for help transcribing the rounding error model from Stan to greta. The model is documented in the Stan manual, also a gist here. The Stan code is:

data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real mu;
  real<lower=0> sigma_sq;
  vector<lower=-0.5, upper=0.5>[N] y_err;
}
transformed parameters {
  real<lower=0> sigma;
  vector[N] z;
  sigma = sqrt(sigma_sq);
  z = y + y_err;
}
model {
  target += -2 * log(sigma);
  z ~ normal(mu, sigma);
}

I only managed to work until the transformed parameter part… not even sure if I’m doing it right in greta:

library(greta)

# data
set.seed(12)
y <- rnorm(20, 10, 2)
y_r <- round(y, 0)
N <- length(y)

# priors
mu <- variable()
sigma_sq <- variable(lower = 0)
y_err <- variable(lower = -0.5, upper = 0.5, dim = N)
sigma <- sqrt(sigma_sq)
z <- normal(mu, sigma, dim = N)  # or is this z <- y + y_err ???

Mainly, I don’t know how to code for z, and what should go into distribution() (is it y or z?) :sweat_smile:

1 Like

Hi @hrlai!

Sorry it took me a while to get back to you on this.

I’m still getting my head around things, but talking to Nick Golding about this, my understanding is that this model directly modifies the log probability. We don’t allow for that in greta, instead we require people to write their own distribution. We’re working on making that easier soon, with an upcoming greta.distributions package to simplify that.

I think you want something like the interval censored lognormal distribution - fortunately Nick has defined this for a specific use case here: https://github.com/goldingn/covid19_australia_interventions/blob/60c5250e8c87b9f7b7af9088aa2742120aa69915/R/functions.R#L3625-L3710

Eventually that distribution will make its way into the greta.distributions package, but you are free to import that yourself. We’ll also provide some more detail on what each of the parts of the distribution code do.

So then, my understanding is that you’d import that and then change the distribution to be something like:

distribution(y_r) <- interval_censored_lognormal(
    meanlog = mu,
    sdlog = sigma,
    breaks = 1L:4L, # I'm not sure about this part
    dim = N
  )

Thanks a lot @njtierney and @nick. This is such a coincidence as I was just reading up Nick’s COVID manuscript yesterday! And I was stoked to see the way he did the interval censored lognormal indeed. I’ll see if I can modify it for a normal and report back.

1 Like

Seems not so hard to modify it from lognormal to normal, and changing from tf$floor to tf$round… but the roadblock lies in allowing rounding to higher precision, e.g., round(x, 1). From this greta code chunk it seems that it is not possible?

1 Like

A temporary solution would be to just scale the variable of interest up so it has no decimals, e.g., for c(0.1, 0.2, 0.3) in meters just multiply by 100 to become c(10, 20, 30) centimeters.

1 Like

The rounding workaround makes sense to me, but I’ll check in with @nick on this, feels like there should be another way.

Yeah, good workaround.

In fact we could just modify our greta::round() to use a new tensorflow function that uses the digits argument instead of warning. So the internal greta tf_round() function would look something like this:

tf_round <- function(x, digits) {
  # round the digits to nearest integer, to match R's round behaviour
  digits <- tf$round(digits)
  # convert to a factor of 10
  factor <- fl(10) ^ digit
  # scale, round to integer, unscale
  tf$round(x * factor) / factor
}

and greta::round(x, digits) could just call tf_round(x, digits) instead of doing tf$round(x) and giving a warning.

1 Like

Added this to issues: https://github.com/greta-dev/greta/issues/532

1 Like