Deviance Information Criterion (DIC) in greta?

vlandau · September 9, 2019, 9:34pm

Hi All,

I’m working on fitting some logit regression models to a large dataset, and I’ll need to do model selection using DIC. It looks like DIC isn’t built in to greta, so I was wondering if anyone already has a solution.

I can write my own function to calculate it on the greta output, but I wanted to post here first to say some sort of built-in DIC function would be a new feature for greta, and to see if there is already a robust solution out there.

Thanks!
Vincent

nick · September 10, 2019, 10:33pm

Yes, it would be handy. Should be fairly straightforward to implement, too.

What do you think the interface should look like?

vlandau · September 10, 2019, 11:22pm

I think the user could have the option to provide the likelihood to use for calculating deviance in the form deviance_likelihood(y) = ... (but with a better function name), that defaults to the likelihood of the data. An option for the user to specify it will come in handy if an integrated likelihood should be used (for example for models where a latent state needs to be integrated out of the likelihood before calculating DIC). Then maybe a simple flag somewhere (maybe in mcmc(..., DIC = TRUE)?) to specify that DIC should be calculated?

nick · September 11, 2019, 12:03am

Right, dealing with the variable focus issue of DIC for hierarchical models is a good point I hadn’t considered!

I’m not crazy about having a DIC argument to mcmc, since it’s not something that modifies how mcmc is done.

We could compute DIC from the draws object though (it actually stores all the ‘raw’ parameter values, which is how calculate()works), perhaps with an option to modify which bits of the model to look at (considering all distributions by default). So how about something like:

draws <- mcmc(m)
dic(draws)

or

dic(draws, focus = list(y))

where ‘focus’ is a list of greta arrays that have distributions assigned to them.

Jakob · October 24, 2019, 2:52pm

Hi,

+1 for information criteria support, that would be really nice!
Why not go all the way though and make use of the loo package, which computes LOOIC and WAIC which are both supposed to be supperior to DIC? The only thing that would be needed is a function which computes a log-likelihood matrix from the draws, which could then be used as input for loo().

As a new user I’m really liking the way greta is headed btw., it’s so convenient yet powerful, at least for the couple of cases I tried so far. Thanks for the great work!

Best
Jakob

nick · November 2, 2019, 11:15pm

Thanks Jakob!

I’ll take a look at the loo package when I get to thinking about implementing DIC etc. (probably in a rev sprint early next year). I had assumed calculation was both more involved and more tied to Stan, but the way you describe it sounds very doable!

lionel68 · March 9, 2020, 3:47pm

Any news on that? With the loo package the only thing that we would need from a fitted greta model would be:
" An I by C by N array, where I is the number of MCMC iterations per chain, C is the number of chains, and N is the number of data points."

This could then be passed to the functions loo::loo() and loo:relative_eff()

I tried the following naive implementation:

# simple lm example
x <- runif(100, -2, 2)
y <- rnorm(100, 1 + 2 * x, 1)

# get log-likelihood array, p(y | linpred, sd)
intercept <- normal(0, 2)
slope <- normal(0, 1)

sd_res <- cauchy(0, 2, truncation = c(0, Inf))

linpred <- intercept + slope * x

y <- as_data(y)

distribution(y) <- normal(linpred, sd_res)

loglik <- dnorm(y, linpred, sd_res, log = TRUE)

m <- model(intercept, slope, sd_res, loglik)

d <- mcmc(m)

But got the following error message:

Error: loglik is a data greta array, data greta arrays cannot be sampled

@vlandau and @nick did you implement some functions for the DIC that could be used there?

nick · March 10, 2020, 2:52am

Right, this doesn’t work because there’s no greta dnorm function. I am surprised it doesn’t error when that’s called though. Will look into that.

You could write out the normal log-density explicitly and this should work. Something like:

lik <- (1 / (sd_res * sqrt(2 * pi))) * exp(((y - linpred) / sd_res) ^ 2)
loglik <- log(lik)

(though it would be more computationally stable if you simplify that to avoid the exponentiation)

It would be nice to have a function that would compute the densities automatically. With an interface like:

loglik <- density(y, log = TRUE)

That would return a greta array with the log densities for each element in y. This would only work for greta arrays that follow distributions. It shouldn’t be too hard to get this working. But I wonder whether it should be in an extension package or greta itself. There must be other uses for these densities too, but none spring to mind immediately.

hrlai · August 30, 2020, 1:37am

Hi @nick and everyone interested in information criteria from greta,

+1 for implementing an argument to extract log-likelihood from greta draws, since this will be very compatible with the loo package.

Just to follow up the code that @nick shared:

lik <- (1 / (sd_res * sqrt(2 * pi))) * exp(((y - linpred) / sd_res) ^ 2)
loglik <- log(lik)

Does this only work for normally-distributed y? I am guessing that it is different in a GLM, e.g.,

distribution(y) <- lognormal(linpred, sd_log)

should it be the PDF of lognormal distribution:

lik <- (1 / (y * sd_log * sqrt(2 * pi))) * exp(-(log(y) - linpred)^2 / (2 * sd_log^2))

?

vlandau · September 3, 2020, 6:12pm

Realized I’ve neglected this thread! For DIC, I’ve just been using calculate() lately. It’s not too bad for simple problems. In case it’s useful here’s a pass at DIC for logistic regression, where prob is the inverse logit of the linear combination of covariates and regression coefficients (betas), and draws is the output from mcmc():

# A function to calculate deviance for a logistic regression model
deviance <- function(prob, y){
  -2 * sum(dbinom(y, size = 1, prob = boot::inv.logit(prob), log = TRUE))
}

# This code comes after you've defined and fit your model
probs <- as.data.frame(t(do.call(rbind, calculate(prob, draws))))

# Calculate deviance based on prob in parallel on a cluster, cl <- makeCluster(...)
dev <- unlist(parLapply(cl = cl, X = probs,
                        fun = deviance,
                        y = data$presence))
p_d <- mean(dev) - deviance(covariates %*% colMeans(do.call(rbind,draws)), data$presence)

DIC <- p_d + mean(dev)

Figured I’d share in case it’s helpful to anyone.