Implementing a multivariate Bernoulli

dirmeier · February 10, 2019, 7:30pm

Dear all,
I am trying to set up a distribution for a binary RV \mathbf{x} \in \{ 0, 1\}^k:

P(\mathbf{x} \mid \boldsymbol p) = \prod^k_i Bernoulli(x_i \mid p_i)

for a mixture of images. The code of the distribution is below:

multivariate_bernoulli_distribution <- R6Class(
  "multivariate_bernoulli_distribution",
  inherit = distribution_node,
  public = list(
    initialize = function(prob, dim) {
      prob <- as.greta_array(prob)
      dim <- check_dims(prob, target_dim = dim)
      super$initialize("multivariate_bernoulli", dim, discrete = TRUE)
      self$add_parameter(prob, "prob")
    },

    tf_distrib = function(parameters, dag) {
        prob <- parameters$prob
        log_prob <- function(x) {
          sum(x * log(prob) + (1 - x) * log(1 - prob))
        }
        list(log_prob = log_prob, cdf = NULL, log_cdf = NULL)
    },

    tf_cdf_function = NULL,
    tf_log_cdf_function = NULL
  )
)

When I test the distribution everything seems to work out, only the sampling throws an error:


n <- 25
p <- 3
prob <- beta(1, 1, p)
d <- matrix(rbinom(n * p, 1, .1), n)

for (i in seq(nrow(d)))
{
  distribution(d[i,]) <- multivariate_bernoulli(prob) 
}

m <- model(prob)
plot(m)
samples <- mcmc(m)

 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypeError: Input 'y' of 'Sub' Op has type float64 that does not match type float32 of argument 'x'

Any idea how I could solve this?

Thanks,
Simon

nick · February 11, 2019, 12:44am

Nice!

The issue here is that the tensorflow code (in log_prob) uses an R numeric object (1) in combination with a tensor (x and prob). The R tensorflow package coerces the 1 to a tensor for you, but in the absence of other information assumes you want a single precision float (float32). By default those tensors x and prob will be doubles (float64). Because Python/tensorflow is strict about these things, you can’t subtract a double from a single without explicitly convert one type to the other.

Internal greta tensorflow code should be agnostic to whether the user wants singles or doubles, since that’s an option users can pass to model(). The solution is to write code that finds out what type of float we are using, and creates the right type of object from those constants. This comes up a lot, so there’s an internal greta helper function fl() that does that for you, you’ll see it sprinkled liberally around the code for the probability distributions, e.g. here.

So just switch:

sum(x * log(prob) + (1 - x) * log(1 - prob))

to

sum(x * log(prob) + (fl(1) - x) * log(fl(1) - prob))

and you shouldn’t see that error any more.

You can get fl() from greta::.internals$utils$misc$fl

dirmeier · February 12, 2019, 6:38pm

Hey,
thanks for the quick reply. I still don’t manage to get it fixed. When I change to fl I get the following error:

Error in sum(x * log(prob) + (fl(1) - x) * log(fl(1) - prob)) : 
  invalid 'type' (environment) of argument

When I set a breakpoint at that position (see below) the dtype and Tensor seem to be correct. Any ideas what I am doing wrong?

Thanks again for the help and spending so much of your time on developing greta.

Cheers,
Simon

nick · February 13, 2019, 6:12am

It’s because sum() isn’t a tensorflow operation, so R’s sum() is trying to work out what to do with these tensor objects (which are actually environments). The tensorflow equivalent is tf$reduce_sum().

However, because greta now has a first ‘batch’ dimension that’s used to parallelise across chains (and across draws, when tracing values), you need to be careful with the dimensions you want to reduce along. There’s an internal greta function to help with that: greta:::tf_sum().

When you come to make this work across multiple observations (rows) simultaneously, log_prob should be a vector, with an element per observation (not including the batch dimension), so you’ll want something like this:

vals <- x * log(prob) + (fl(1) - x) * log(fl(1) - prob)
tf$reduce_sum(vals, axis = 2L)

BTW, I debugged this by sticking browser() inside the definition of log_prob(), like this:

log_prob <- function(x) {
  browser()
  tf_sum(x * log(prob) + (fl(1) - x) * log(fl(1) - prob))
}

which throws up an interactive debugging session when that function is called. You can then play with the objects and see where the error comes in.

The code below works for me with this single-observation version:

library(greta)
library(R6)

# get greta internals
distribution_node <- .internals$nodes$node_classes$distribution_node
as.greta_array <- .internals$greta_arrays$as.greta_array
check_dims <- .internals$utils$checks$check_dims
distrib <- .internals$nodes$constructors$distrib
fl <- .internals$utils$misc$fl
tf_sum <- greta:::tf_sum  # oops, looks like this one didn't make it into .internals!
  
multivariate_bernoulli_distribution <- R6Class(
  "multivariate_bernoulli_distribution",
  inherit = distribution_node,
  public = list(
    initialize = function(prob, dim) {
      prob <- as.greta_array(prob)
      dim <- check_dims(prob, target_dim = dim)
      super$initialize("multivariate_bernoulli", dim, discrete = TRUE)
      self$add_parameter(prob, "prob")
    },
    
    tf_distrib = function(parameters, dag) {
      prob <- parameters$prob
      log_prob <- function(x) {
        tf_sum(x * log(prob) + (fl(1) - x) * log(fl(1) - prob))
      }
      list(log_prob = log_prob, cdf = NULL, log_cdf = NULL)
    },
    
    tf_cdf_function = NULL,
    tf_log_cdf_function = NULL
  )
)

multivariate_bernoulli <- function (prob, dim = NULL) {
  distrib("multivariate_bernoulli", prob, dim)
}

n <- 25
p <- 3
prob <- beta(1, 1, p)
d <- matrix(rbinom(n * p, 1, .1), n)

for (i in seq(nrow(d)))
{
  distribution(d[i,]) <- multivariate_bernoulli(prob) 
}

m <- model(prob)
plot(m)
samples <- mcmc(m)

dirmeier · February 13, 2019, 4:54pm

Thanks, that worked great!