Deviance Information Criterion (DIC) in greta?

Hi All,

I’m working on fitting some logit regression models to a large dataset, and I’ll need to do model selection using DIC. It looks like DIC isn’t built in to greta, so I was wondering if anyone already has a solution.

I can write my own function to calculate it on the greta output, but I wanted to post here first to say some sort of built-in DIC function would be a new feature for greta, and to see if there is already a robust solution out there.

Thanks!
Vincent

Yes, it would be handy. Should be fairly straightforward to implement, too.

What do you think the interface should look like?

I think the user could have the option to provide the likelihood to use for calculating deviance in the form deviance_likelihood(y) = ... (but with a better function name), that defaults to the likelihood of the data. An option for the user to specify it will come in handy if an integrated likelihood should be used (for example for models where a latent state needs to be integrated out of the likelihood before calculating DIC). Then maybe a simple flag somewhere (maybe in mcmc(..., DIC = TRUE)?) to specify that DIC should be calculated?

Right, dealing with the variable focus issue of DIC for hierarchical models is a good point I hadn’t considered!

I’m not crazy about having a DIC argument to mcmc, since it’s not something that modifies how mcmc is done.

We could compute DIC from the draws object though (it actually stores all the ‘raw’ parameter values, which is how calculate()works), perhaps with an option to modify which bits of the model to look at (considering all distributions by default). So how about something like:

draws <- mcmc(m)
dic(draws)

or

dic(draws, focus = list(y))

where ‘focus’ is a list of greta arrays that have distributions assigned to them.

Hi,

+1 for information criteria support, that would be really nice!
Why not go all the way though and make use of the loo package, which computes LOOIC and WAIC which are both supposed to be supperior to DIC? The only thing that would be needed is a function which computes a log-likelihood matrix from the draws, which could then be used as input for loo().

As a new user I’m really liking the way greta is headed btw., it’s so convenient yet powerful, at least for the couple of cases I tried so far. Thanks for the great work!

Best
Jakob

Thanks Jakob!

I’ll take a look at the loo package when I get to thinking about implementing DIC etc. (probably in a rev sprint early next year). I had assumed calculation was both more involved and more tied to Stan, but the way you describe it sounds very doable!

Any news on that? With the loo package the only thing that we would need from a fitted greta model would be:
" An I by C by N array, where I is the number of MCMC iterations per chain, C is the number of chains, and N is the number of data points."

This could then be passed to the functions loo::loo() and loo:relative_eff()

I tried the following naive implementation:

# simple lm example
x <- runif(100, -2, 2)
y <- rnorm(100, 1 + 2 * x, 1)

# get log-likelihood array, p(y | linpred, sd)
intercept <- normal(0, 2)
slope <- normal(0, 1)

sd_res <- cauchy(0, 2, truncation = c(0, Inf))

linpred <- intercept + slope * x

y <- as_data(y)

distribution(y) <- normal(linpred, sd_res)

loglik <- dnorm(y, linpred, sd_res, log = TRUE)

m <- model(intercept, slope, sd_res, loglik)

d <- mcmc(m)

But got the following error message:

Error: loglik is a data greta array, data greta arrays cannot be sampled

@vlandau and @nick did you implement some functions for the DIC that could be used there?

Right, this doesn’t work because there’s no greta dnorm function. I am surprised it doesn’t error when that’s called though. Will look into that.

You could write out the normal log-density explicitly and this should work. Something like:

lik <- (1 / (sd_res * sqrt(2 * pi))) * exp(((y - linpred) / sd_res) ^ 2)
loglik <- log(lik)

(though it would be more computationally stable if you simplify that to avoid the exponentiation)

It would be nice to have a function that would compute the densities automatically. With an interface like:

loglik <- density(y, log = TRUE)

That would return a greta array with the log densities for each element in y. This would only work for greta arrays that follow distributions. It shouldn’t be too hard to get this working. But I wonder whether it should be in an extension package or greta itself. There must be other uses for these densities too, but none spring to mind immediately.