Ordinal regression

franciscolima · January 21, 2019, 7:45pm

Hi guys,

First of all, congratulations to the authors and maintainers for putting out this great package! Is there any way of conducting ordinal categorical regression, as implemented in the rethinking package via dordlogit? I noticed that the greta transformation functions (e.g. ilogit, imultilogit) tap into the tensorflow backend. Is it easy to deploy custom transformations?

Best,

Francisco

Voltemand · January 23, 2019, 10:59pm

Hi franciscolima,

Thanks for your interest! Unfortunately we are a bit limited in that all transformations have to be ‘defined’ in Tensorflow syntax. Having said, that if there is a way to decompose dordlogit into logit etc. then it should be possible to implement it. I’ve had a look at the rethinking implementation - could you point me to some other introductory literature about the function? I’m not an expert on these kind of models.

All the best!
Jeffrey

franciscolima · January 25, 2019, 3:37pm

Hi Voltemand,

Thanks for your quick response. To be honest, I also know little about the function.

I found, however, a good reference well aligned with the rethinking documentation for the function dordlogit. This is how dordlogit looks like:

function (x, phi, a, log = FALSE)
{
    a <- c(as.numeric(a), Inf)
    p <- logistic(a[x] - phi)
    na <- c(-Inf, a)
    np <- logistic(na[x] - phi)
    p <- p - np
    if (log == TRUE) 
        p <- log(p)
    p
}

It is indeed a chain of logit constrained to have \sum^K_{i = 1} p_i = 1, assuming there are K groups / levels, by taking the cumulative probabilities and bound by 0 and 1 (Inf and -Inf in log-odds).

Let me know if I can help! Have a good weekend!

Francisco

nick · February 5, 2019, 9:58pm

I hadn’t thought about including an ordered logit distribution, but it’s available in JAGS, Stan etc. so definitely worth including - thanks for the prompt!

As a stopgap, you can define the model using a categorical distribution and some manipulation of inputs. Here’s a first pass at an R function that wraps that all up. I will add an issue on Github to write up a more efficient tensorflow version.

# ordered logit model in greta
library(greta)

# First pass at an ordered logit distribution.
# phi is a column vector of logit-effects
# a is a k-1, increasing column vector of logit-cutoffs
# the output is a one-hot-encoded matrix, with each row an observation 
ordered_logit <- function(phi, a, n_realisations = NULL) {
  
  as.greta_array <- .internals$greta_arrays$as.greta_array
  
  # coerce to greta arrays
  a <- as.greta_array(a)
  phi <- as.greta_array(phi)
  
  # get number of realisations (check_multivariate_dims doesn't work here
  # because both are vectors)
  if (!is.null(n_realisations)) {
    n <- n_realisations
  } else {
    n <- nrow(phi)
  }

  # replicate phi if it's a scalar
  if (nrow(phi) == 1 & n != 1) {
    phi <- rep(phi, n)
  }
  
  # build matrix where element i,j is phi[i] - a[j]
  km1 <- length(a)
  phi <- do.call(cbind, replicate(km1, phi, simplify = FALSE))
  eta <- sweep(phi, 2, a, "-")
  
  # convert to logits
  p <- ilogit(eta)
  
  # get cumulative probabilities
  probs <- cbind(rep(1, n), p) - cbind(p, rep(0, n))
  
  # return categorical distribution
  categorical(probs)
  
}

Here’s a worked example based on the section in the rethinking book:


# test on rethinking problem
library(rethinking)
data(Trolley)
d <- Trolley[1:1000, ]
k <- 7
n <- nrow(d)

# make one-hot-encoded response vector
resp <- matrix(0, n, k)
resp[cbind(1:n, d$response)] <- 1

# unordered vector of cutpoints (bit flaky and needs increasing starting values
# to be provided)
# cutpoints <- normal(0, 10, dim = k - 1)

# make an ordered vector of cutpoints (the priors become a bit tricky here though)
cutpoints_raw <- c(normal(0, 10), normal(0, 1, truncation = c(0, Inf), dim = k - 2))
cutpoints <- cumsum(cutpoints_raw)
phi <- 0
distribution(resp) <- ordered_logit(phi, cutpoints, n_realisations = n)

m <- model(cutpoints)
draws <- mcmc(m)  # , initial_values = initials(cutpoints = -2:3)

Comparison with the rethinking/stan version:

m2 <- map2stan(
  alist(
    response ~ dordlogit(phi, cutpoints),
    phi <- 0,
    cutpoints ~ dnorm(0, 10)
  ),
  data = list(response = d$response),
  start = list(cutpoints = c(-2, -1, 0, 1, 2, 2.5)),
  chains = 4, cores = 4
)
summary(draws)$statistics
precis(m2, depth = 2)

I get very similar results to stan/rethinking, so I think this implementation works!

franciscolima · February 7, 2019, 6:33pm

Hi Nick,

It works here too, thanks for figuring it out! I will give it a thorough testing and provide any updates in this thread. Cheers

Francisco

hrlai · June 15, 2022, 11:40pm

@njtierney I wonder if this would also make a good addition to greta.distributions ? @nick 's code wraps around categorical() so it isn’t really defining a new distribution as intended in greta.distributions.

njtierney · June 16, 2022, 5:15am

Sounds good to me! Added here: https://github.com/greta-dev/greta.distributions/issues/14

olivuntu · June 11, 2023, 4:40pm

Hi folks,

I’ve also tried @nick’s implementation of the ordered_logit distribution. It seems to work nicely (thanks!), but after some time playing with it, I’m still having trouble to get “proper” convergence and good posterior checks. It seems to relate somehow to the definition of the prior for cutpoints. This thread (https://stats.stackexchange.com/questions/318281/choosing-informative-priors-for-bayesian-ordered-logistic-regression) pointed me to a possible solution here (https://betanalpha.github.io/assets/case_studies/ordinal_regression.html#2_Cut_Off_at_the_Pass) that seemingly works in Stan. I’m currently trying to implement the proposed induced_dirichlet distribution to use it as prior for cutpoints. Has there been progress with the greta.distributions package that could simplify the process? This could be the right occasion to include the ordered_logit as well. Thanks for you lights!