Softmax / conditional logistic regression with varying choice set in Greta

Jakob · November 22, 2019, 2:04pm

Hi everyone,

I would like to fit a conditional logistc (softmax) regression model where the set of options varies over choice scenarios/strata.

In stan, the model can be coded by turning the data into long format, giving ‘start’ and ‘end’ indicators for the individual strata and utilizing the log_softmax() function to compute log probabilities (see stan code below). The log likelihood is then just the sum of the log probabilites at the observed choices (credit to James Savage).

As I have a lot of data, I wanted to try how tensorflow parallelisation fares in terms of speed. However, I’m having a hard time translating the model structure to greta.

Does anyone know if/how this could be done?

All the best and thanks a lot
Jakob

PS: Here’s the stan code:

data {
  int<lower=0> N;
  int<lower=0> E;
  int<lower=1> K;
  vector[N] y;
  matrix[N,K] x;
  int<lower=1> start[E];
  int<lower=1> end[E];
}

parameters {
  vector[K] beta;
}

model {
  vector[N] log_prob;

  beta ~ normal(0, .5);

  for (e in 1:E) 
    log_prob[start[e]:end[e]] = log_softmax(x[start[e]:end[e]] * beta);

  target += log_prob' * y;
}

nick · November 24, 2019, 12:45am

I think this corresponds to a multinomial distribution over y, so in greta you’d have to code it that way to define the density.

Are all the E vectors of the same length, or do they have different lengths? If they are the same length say, of length M, it would be most efficient to reshape y and x into ExM matrices Y and X and then do something like:

beta <- normal(0, 0.5)
eta <- X * beta
prob <- imultilogit(eta)
distribution(Y) <- multinomial(prob)

If not, you’re better having each vector in a list of length E and doing something like:

beta <- normal(0, 0.5)
eta_list <- lapply(x_list, `*`, beta)
prob_list <- lapply(eta_list, imultilogit)
for (e in seq_len(E)) distribution(y_list[[e]]) <- multinomial(prob_list[[e]])

(writing this on a phone and without checking, so the above code may not work!)

The first one of these will be much more efficient for large E, since tensorflow can efficiently dispatch and parallelise the multiple likelihoods.

One issue is that this is not working with the logarithms, so it’ll incur a computational cost and lose numerical precision, which could translate to poor convergence. I could make this work automatically with the logs by adding a log representation to the internals of imultilogit() and using that in multinomial(). If you don’t want to wait for that (it would have to be next year) you could try writing your own greta distribution to implement this. I’d be happy to help if you want to go down that route, just let me know.

Jakob · November 25, 2019, 8:53am

Hi Nick,

thanks again for the quick and helpful reply, I appreciate the work you put into this a lot.

The E vectors are sadly not the same length. As stan does not natively support ragged array / list data structures yet, this model uses the start and end indices to access segments of unequal length.
The list variant you proposed seems an elegant approach but I was afraid that the ragged data structure might break parallelisation, as you confirmed. I’ll still give it a try, though.

All the best
Jakob