Different likelihoods for subsets of data arrays


Howdy! greta is really cool, thanks for making it! I’m still learning the ins and outs of it in toy scenarios but already I like it a whole bunch. I was curious about something. Suppose I have a simple change-point model of counts: y_t \sim \text{Poisson}(\lambda), t = 1, \ldots, T with:

\lambda = \begin{cases} \lambda_1, & \text{if } t \leq T_0, \\ \lambda_2, & \text{if } t > T_0. \\ \end{cases}

I tried

# df$y has been generated with rpois(lambda_1) and rpois(lambda_2)
lambda <- normal(100, 10, truncation = c(0, Inf), dim = 2)
y <- as_data(df$y) # y is a greta data array of size T
distribution(y[1:T_0]) <- poisson(lambda[1])
distribution(y[(T_0 + 1):T]) <- poisson(lambda[2])

which results in: Error: distributions can only be assigned to data greta arrays

I just wanted to make sure that I’m not missing anything in the documentation :slight_smile: The only solution in this case is to split it into two separate data arrays like

y1 <- as_data(df$y[1:T_0])
y2 <- as_data(df$y[(T_0 + 1):T])
distribution(y1) <- poisson(lambda[1])
distribution(y2) <- poisson(lambda[2])



That’s right. Once you turn some data into a data greta array, any operations you do on it (including subsetting) will return an operation greta array. So if you printed y[1:T_0], you’d see a bunch of ?s.

Users can’t define distributions on operation greta arrays, because in general that can lead to issues with log jacobian adjustments (and because that isn’t a generative model).

We could potentially be more clever about when we do that, versus subsetting the data in a future version of greta.