Ordinal regression


#1

Hi guys,

First of all, congratulations to the authors and maintainers for putting out this great package! Is there any way of conducting ordinal categorical regression, as implemented in the rethinking package via dordlogit? I noticed that the greta transformation functions (e.g. ilogit, imultilogit) tap into the tensorflow backend. Is it easy to deploy custom transformations?

Best,

Francisco


#2

Hi franciscolima,

Thanks for your interest! Unfortunately we are a bit limited in that all transformations have to be ‘defined’ in Tensorflow syntax. Having said, that if there is a way to decompose dordlogit into logit etc. then it should be possible to implement it. I’ve had a look at the rethinking implementation - could you point me to some other introductory literature about the function? I’m not an expert on these kind of models.

All the best!
Jeffrey


#3

Hi Voltemand,

Thanks for your quick response. To be honest, I also know little about the function.

I found, however, a good reference well aligned with the rethinking documentation for the function dordlogit. This is how dordlogit looks like:

function (x, phi, a, log = FALSE)
{
    a <- c(as.numeric(a), Inf)
    p <- logistic(a[x] - phi)
    na <- c(-Inf, a)
    np <- logistic(na[x] - phi)
    p <- p - np
    if (log == TRUE) 
        p <- log(p)
    p
}

It is indeed a chain of logit constrained to have \sum^K_{i = 1} p_i = 1, assuming there are K groups / levels, by taking the cumulative probabilities and bound by 0 and 1 (Inf and -Inf in log-odds).

Let me know if I can help! Have a good weekend!

Francisco


#4

I hadn’t thought about including an ordered logit distribution, but it’s available in JAGS, Stan etc. so definitely worth including - thanks for the prompt!

As a stopgap, you can define the model using a categorical distribution and some manipulation of inputs. Here’s a first pass at an R function that wraps that all up. I will add an issue on Github to write up a more efficient tensorflow version.

# ordered logit model in greta
library(greta)

# First pass at an ordered logit distribution.
# phi is a column vector of logit-effects
# a is a k-1, increasing column vector of logit-cutoffs
# the output is a one-hot-encoded matrix, with each row an observation 
ordered_logit <- function(phi, a, n_realisations = NULL) {
  
  as.greta_array <- .internals$greta_arrays$as.greta_array
  
  # coerce to greta arrays
  a <- as.greta_array(a)
  phi <- as.greta_array(phi)
  
  # get number of realisations (check_multivariate_dims doesn't work here
  # because both are vectors)
  if (!is.null(n_realisations)) {
    n <- n_realisations
  } else {
    n <- nrow(phi)
  }

  # replicate phi if it's a scalar
  if (nrow(phi) == 1 & n != 1) {
    phi <- rep(phi, n)
  }
  
  # build matrix where element i,j is phi[i] - a[j]
  km1 <- length(a)
  phi <- do.call(cbind, replicate(km1, phi, simplify = FALSE))
  eta <- sweep(phi, 2, a, "-")
  
  # convert to logits
  p <- ilogit(eta)
  
  # get cumulative probabilities
  probs <- cbind(rep(1, n), p) - cbind(p, rep(0, n))
  
  # return categorical distribution
  categorical(probs)
  
}

Here’s a worked example based on the section in the rethinking book:


# test on rethinking problem
library(rethinking)
data(Trolley)
d <- Trolley[1:1000, ]
k <- 7
n <- nrow(d)

# make one-hot-encoded response vector
resp <- matrix(0, n, k)
resp[cbind(1:n, d$response)] <- 1

# unordered vector of cutpoints (bit flaky and needs increasing starting values
# to be provided)
# cutpoints <- normal(0, 10, dim = k - 1)

# make an ordered vector of cutpoints (the priors become a bit tricky here though)
cutpoints_raw <- c(normal(0, 10), normal(0, 1, truncation = c(0, Inf), dim = k - 2))
cutpoints <- cumsum(cutpoints_raw)
phi <- 0
distribution(resp) <- ordered_logit(phi, cutpoints, n_realisations = n)

m <- model(cutpoints)
draws <- mcmc(m)  # , initial_values = initials(cutpoints = -2:3)

Comparison with the rethinking/stan version:

m2 <- map2stan(
  alist(
    response ~ dordlogit(phi, cutpoints),
    phi <- 0,
    cutpoints ~ dnorm(0, 10)
  ),
  data = list(response = d$response),
  start = list(cutpoints = c(-2, -1, 0, 1, 2, 2.5)),
  chains = 4, cores = 4
)
summary(draws)$statistics
precis(m2, depth = 2)

I get very similar results to stan/rethinking, so I think this implementation works!


#5

Hi Nick,

It works here too, thanks for figuring it out! I will give it a thorough testing and provide any updates in this thread. Cheers

Francisco