Problem with coding a new probability distribution in Greta

Hi everyone,

I am trying to use the generalized normal distribution (https://en.wikipedia.org/wiki/Generalized_normal_distribution) in a Greta model. As this distribution is not available, I have tried to follow the only example I have met so far, which is here:https://mdscheuerell.github.io/gretaDFA/[https://mdscheuerell.github.io/gretaDFA/].

The probelm I have is that when I run the Greta model I have the following error:
Error in self$tf_distrib(parameters, dag)$log_prob(x) :
object ‘tf’ cannot be found

Any idea on how to change my code? Here is the code I wrote:

library (R6)
distrib <- .internals$nodes$constructors$distrib
distribution_node <- .internals$nodes$node_classes$distribution_node
check_dims <- .internals$utils$checks$check_dims
as.greta_array <- .internals$greta_arrays$as.greta_array
is_scalar <- .internals$utils$misc$is_scalar
fl <- .internals$utils$misc$fl

function to call

gnorm <- function (mu, sigma, beta, dim = 1) {
distrib(“gnorm”, mu, sigma, beta, dim)
}

gnorm_distribution <- R6Class (
“gnorm_distribution”,
inherit = distribution_node,
public = list(

initialize = function (mu, sigma, beta, dim) {
  
  # (dim)
  if (length(dim) > 1 ||
      dim <= 0 ||
      !is.finite(dim) ||
      dim != floor(dim)) {
    
    stop ("dim must be a scalar positive integer, but was: ",
          capture.output(dput(dim)),
          call. = FALSE)
    
  }
  
 

  # check if sigma is scalar
  # (sigma)
  if (length(sigma) > 1) {
    
    stop ("sigma must be a scalar positive integer, but was: ",
          capture.output(dput(sigma)),
          call. = FALSE)
    
  }
  
  

  # check if mu is scalar
  # (mu)
  if (length(mu) > 1) {
    
    stop ("mu must be a scalar positive integer, but was: ",
          capture.output(dput(mu)),
          call. = FALSE)
    
  }
  
  

  # check if beta is scalar
  # (beta)
  if (length(beta) > 1) {
    
    stop ("beta must be a scalar positive integer, but was: ",
          capture.output(dput(beta)),
          call. = FALSE)
    
  }
  sigma <- as.greta_array(sigma)
  mu <- as.greta_array(mu)
  beta <- as.greta_array(beta)
  
  self$bounds <- c(0, Inf)
  super$initialize("gnorm", dim, truncation = c(0, Inf))
 self$add_parameter(sigma, "sigma")
 self$add_parameter(mu, "mu")
 self$add_parameter(beta, "beta")

},

tf_distrib = function (parameters, dag) {

  mu <- parameters$mu
  beta <- parameters$beta
  sigma <- parameters$sigma

  # log pdf(x | i, M, sigma)
  log_prob = function (x) {
    tf$log(beta) - tf$log(fl(2)*sigma)-tf$lgamma(1/beta) - tf$pow(tf$abs((x-mu)/sigma), (beta))
  }

  list(log_prob = log_prob, cdf = NULL, log_cdf = NULL)

},

# no CDF for discrete distributions
tf_cdf_function = NULL,
tf_log_cdf_function = NULL

)
)

Sincerely,

Frédéric Gosselin

Hi Frederic,
You can check @nick description on how to add new distributions to greta in this old pull request. The easiest would be to fork the repository, add your function code there and then load this package in R via devtools::install_github.

I tried this out with a zero-inflated poisson: https://github.com/lionel68/greta/commit/6aa4080e2979baf3e97952d0fd23241d73c91d4f, you can check there where the function code should go. I also recommend to add so very simple tests to check that the distribution function you created leads to similar density estimations than established function in other R packages. You can check the different tests in the test/ subfolder on the repo.

Once you’ve done this best would be to open a pull request to the main repo so that your implementation can be checked and made available to the rest of the world.

Hope this helps.
Lionel

1 Like

Thank you very much Lionel. Very helpful.

I will try to follow your example (yet, I am not a Github expert so may take a little time).

All the best,

Frédéric

Dear all,
I have followed the initial line of my program. The problem is that some times (in sme R sessions), the program works and not in others.

Here is below a program easier to replicate.

Sincerely,

Frédéric

#Packages
library(coda)

Sys.setenv(RETICULATE_PYTHON = “C:/Users/frederic.gosselin/Miniconda3/envs/r-tensorflow”)
reticulate::use_condaenv(“r-tensorflow”)
library(greta)

library (R6)
distrib <- greta::.internals$nodes$constructors$distrib
distribution_node <- greta::.internals$nodes$node_classes$distribution_node
check_dims <- greta::.internals$utils$checks$check_dims
as.greta_array <- greta::.internals$greta_arrays$as.greta_array
is_scalar <- .internals$utils$misc$is_scalar
fl <- .internals$utils$misc$fl

function to call

gnorm <- function (mu, sigma, beta0, dim = 1) {
distrib(“gnorm”, mu, sigma, beta0, dim)
}

Leung & Drton distn for prior of diag(Z)

gnorm_distribution <- R6Class (
“gnorm_distribution”,
inherit = distribution_node,
public = list(

initialize = function (mu, sigma, beta0, dim) {
  

  sigma <- as.greta_array(sigma)
  mu <- as.greta_array(mu)
  beta0 <- as.greta_array(beta0)
 dim <- check_dims(mu, sigma, beta0, target_dim = dim)

 super$initialize("gnorm", dim)
 self$add_parameter(sigma, "sigma")
 self$add_parameter(mu, "mu")
 self$add_parameter(beta0, "beta0")

},

tf_distrib = function (parameters, dag) {

  mu <- parameters$mu
  beta0 <- parameters$beta0
  sigma <- parameters$sigma

  # log pdf(x | i, M, sigma)
  log_prob = function (x) {
  
   tf$log(beta0)- tf$log(fl(2)*sigma)-tf$lgamma(fl(1)/beta0) - tf$pow(tf$abs((x-mu)/sigma), (beta0))
         }

  list(log_prob = log_prob, cdf = NULL, log_cdf = NULL)

},

# no CDF for discrete distributions
tf_cdf_function = NULL,
tf_log_cdf_function = NULL

)
)

set.seed(1)
Y<-rnorm(100)
sigma<-normal(0,2,truncation=c(0,Inf))
beta0<-normal(2,2,truncation=c(0,Inf))
mu<-normal(0,1)
Y<-as_data(Y)
distribution(Y)<-gnorm(mu,sigma,beta0)
m<-model(mu,sigma,beta0)

####MO154writing4
set.seed(1)
time1bc<-system.time(test1bc<-greta::mcmc(m,sampler = hmc(Lmin = 10, Lmax = 50)))

I am not necessarily helpful here - so apologies in advance.

I did have a PhD student that I advise code up a zero-inflated Poisson distribution. Her work is below. Maybe by looking at that example, it might just help with your code? I believe the example is taken from Richard McElreath’s blog (McElreath Oxen Example). Good luck!

library("greta")
library("R6")
library("tensorflow")

#set up
distribution_node <- greta::.internals$nodes$node_classes$distribution_node
as.greta_array <- greta::.internals$greta_arrays$as.greta_array
check_dims <- greta::.internals$utils$checks$check_dims
distrib <- greta::.internals$nodes$constructors$distrib
fl <- greta::.internals$utils$misc$fl
tf_sum <- greta:::tf_sum 
tf_prod <- greta:::tf_prod
tf_max <- greta:::tf_max



zero_inflated_distribution <- R6Class(
  "zero_inflated_distribution",
  inherit = distribution_node,
  public = list(
    
    initialize = function(prob, rate, dim) {
      
      prob <- as.greta_array(prob)
      rate <- as.greta_array(rate)
  
      # add the nodes as children and parameters
      dim <- check_dims(prob, rate, target_dim = dim)
      super$initialize("zero_inflated", dim, discrete = TRUE)
      self$add_parameter(prob, "prob")
      self$add_parameter(rate, "rate")
    },
    
    
      tf_distrib = function(parameters, dag) {
       
        prob <- parameters$prob
        rate <- parameters$rate
      
        
        log_prob <- function(x) {
        
        #using relu
        tf$log(prob * tf$cast(tf$nn$relu(fl(1) - x), tf$float64) + (fl(1) - prob) * tf$pow(rate, x) * tf$exp(-rate) / tf$exp(tf$lgamma(x + fl(1))))
          
        #attempts that did not work
        #tf$log(prob * (fl(1) - tf$cast(tf$count_nonzero(c(x)), tf$float64)) + (fl(1) - prob) * tf$pow(rate, x) * tf$exp(-rate) / tf$exp(tf$lgamma(x + fl(1))))    
        #tf$log(tf$to_float(prob) * tf$to_float(tf$to_float(1) - tf$to_float(tf$count_nonzero(tf$to_float(x)))) + tf$to_float((tf$to_float(1) - tf$to_float(prob)) * (tf$to_float(tf$pow(tf$to_float(rate), tf$to_float(x))) * tf$to_float(tf$exp(tf$to_float(-rate)))) / tf$to_float(tf$exp(tf$lgamma(tf$to_float(x) + tf$to_float(1))))))  
          
        #tf$log(prob * tf$to_float(tf_max(c(tf$to_float(1) - tf$to_float(x), tf$to_float(0)))) + tf$to_float(tf$to_float(1) - prob) * tf$to_float(tf$pow(rate, x) * tf$exp(-rate)) / (tf$exp(tf$lgamma(tf$to_float(x) + tf$to_float(1)))))
        }
        
        # attempt that works but did not converge
       #tf$log(prob * tf_max(fl(1) - x, 0) + (fl(1) - prob) * (tf$pow(rate, x) * tf$exp(-rate)) / (tf$exp(tf$lgamma(x + fl(1)))))
        
         list(log_prob = log_prob, cdf = NULL, log_cdf = NULL)
    },
    
    tf_cdf_function = NULL,
    tf_log_cdf_function = NULL
  )
)

zero_inflated <- function (prob, rate, dim = NULL) {
  distrib("zero_inflated", prob, rate, dim)
}

#Data
#simulated data
## define parameters
prob_drink <- 0.2 # 20% of days
rate_work <- 1    # average 1 manuscript per day
## sample one year of production
N <- 365
## simulate days monks drink
drink <- rbinom( N , 1 , prob_drink )
## simulate manuscripts completed
y <- (1-drink)*rpois( N , rate_work )

# prior
log_rate <- normal(0, 10)
logit_drink <- normal(0, 1)
rate <- exp(log_rate)
drink <- ilogit(logit_drink)

# likelihood
distribution(y) <- zero_inflated(drink, rate) 

#model
m <- model(logit_drink, log_rate)

#plot
plot(m)

#samples
samples <- greta::mcmc(m, warmup = 400, n_samples = 4000)

#summary
summary(samples)

Thank you Adam. This actually solves my problem. I actually just forgot to load the tensorflow library. This mere change solves the problem ! Just by adding:
library(tensorflow)

2 Likes

Hi Lionel68, and others,

I wanted to implement a Zero Inflated Negative Binomial (ZINB) distribution in greta (great package, long live !) and recently found your post that explains this. Many thanks for that. However, I am missing something : after modifying the files to include a “zero_inflated_negative_binomial” distribution object and installing the package with “devtools::install_github(my_fork_repository”, R still says the object can be found.
First message on this forum, thanks in advance for any help !

Best wishes,

Olivier

1 Like

Hi Olivier!

Welcome! Thanks for posting!

Are you able to point me to the repository where you implement this ZINB distribution? Happy to take a look - it would also be helpful if you can provide the code that you run when you are using your fork of greta to use ZINB - this way I can try and reproduce the problem on my end and debug it as well :slight_smile:

To try and debug your issue I would try the following:

  1. Go to the R folder/Rstudio project where you’ve created your fork of the repo
  2. Run devtools::load_all()
  3. Test if the script works
  4. If it works, restart R, and try running your code with library(greta) - make sure you’ve incremented the version number in the DESCRIPTION file, so it should be something like 0.4.2.9002 - so you can then check that your package is being loaded when you run library(greta) and then run packageVersion("greta") - which should return 0.4.2.9002 (or whatever you decided)
  5. Test if your script works

Let me know how you go, happy to help!

Hi Nick,

Thanks a lot for your reply and help ! After posting I realized my fork (“olivuntu/greta”) was way behind the original “greta-dev/greta” repo. So I’ve fetched and merged changes, while trying to maintain mine.

I have to say I’m not completely GitHub fluent, rather a beginner, I may have introduced another issue there : when trying to run “devtools::load_all()” as you adviced, I get this message : “Line starting ‘<<<<<<< HEAD …’ is malformed!”, which indicates a mistake in one of the merged file, right ? But I can’t figure out which file is concerned. Do you know how to detect that ?
Sorry to bother with something different, and thanks again.

Olivier

1 Like

Hi Olivier!

No worries! Merging things on github often catches me off guard as well - can you share the URL of your github repository? I can try and take a look directly tomorrow :slight_smile:

Good to know ! Yes, sorry, it’s there !
Thanks indeed ! :slight_smile:

Hi Nic,

I could fix the issue with the malformed line. Now the package installs right, but the ‘zero_inflated_negative_binomial’ still cannot be found. I guess it has to do with exporting the function object, but I still haven’t found the fix for that. I will try and dive in R’s docs, but I wouldn’t mind saving me some time ! :slightly_smiling_face: Did you have a look by any chance ?

All the best,

Olivier

1 Like

Hi @olivuntu!

Apologies for the delay.

I can see zero_inflated_negative_binomial when I run devtools::load_all(), and looking at the code, it all looks fine to me - the only suggestion I have is to us { around the function call, so:

becomes

#' @rdname distributions
#' @export
zero_inflated_negative_binomial <- function(theta, size, prob, dim = NULL){
  distrib('zero_inflated_negative_binomial', theta, size, prob, dim)
}

And make sure that you run devtools::document() - at the moment I cannot see your function in the NAMESPACE file: https://github.com/olivuntu/greta/blob/master/NAMESPACE#L247 - which would happen if you haven’t run devtools::document(), or that function isn’t being recognised.

Let me know how you go?

Hi Nick,

No worries, and thanks for your time, because the new distribution is now well exported !
I’m not sure whether the brackets {} or the call to the document did the trick (probably document), but it did work ! (I guess I have study the different steps of package building).

Of course, another issue has emerged … When the new object is used (zero_negative_negative_binomial, of class function), it produces the following error message:
“Error in initialize(…) : attempt to apply non-function”, which itself appears in the call of “constructor$new(…)”, but I’m still not sure how to correct that !

Cheers

1 Like

Hi @olivuntu!

Sorry I didn’t get back to you sooner!

I’m glad that it now exports! Looking at your code, I noticed the following in your testthat helpers, which isn’t related to this problem, but something good to know about - I recommend changing it as follows:

From:

require(likelihoodExplore) 
dzinb <- function(x, theta, size, prob, log = FALSE)
    return(liknbinom(x, size = size, prob = prob, log = log))

To:

dzinb <- function(x, theta, size, prob, log = FALSE){
  dist_zinb <- likelihoodExplore::liknbinom(x,
                                            size = size,
                                            prob = prob,
                                            log = log)
  return(dist_zinb)
 }

And then using likeligoodExplore in Suggests, by typing this into the R console:

use_package("likelihoodExplore", type = "Suggests")

This changes adds the package to the “Suggests” field in the Description. What this means is that package isn’t required for installation, but only for certain cases - in our case, for testing.

For more details on this, see https://r-pkgs.org/whole-game.html?q=use_package#use_package and https://r-pkgs.org/description.html?q=Suggests#description-dependencies

OK, so no onto the other half of the problem, I will need to use my other non M1 laptop to get this working properly, which is partly why I didn’t get back to you last week as I only had my M1 laptop handy.

Stay tuned!

OK!

So I’m pretty sure I got this to work!

There are two main changes that you need to make:

  1. Change greta::.internals$utils$checks$check_dims to check_dims
  2. update x parameter in your sample function.

So, change these instances of greta::.internals$utils$checks$check_dims to check_dims

and

to use check_dims

And then replace x here:

with size - I think that is right parameter to use. It doesn’t know what x is so it fails then.

Now, I would normally write this up as a pull request to your fork of greta so you can accept these changes directly rather than second hand via this message. But it actually ends up being a bit hard to do that for technical reasons.

Also, Would you be open to submitting this as a pull request into greta? I think that we could do with some of these extra distributions, and then you wouldn’t have to worry about using your own custom version!

Let me know how you go?

Hi @njtierney,

Yes, it does !

I can run a model using the new ZINB distribution, which is really awesome ! I still have to check whether the model makes sense, but that’s another question :grin: At least it does run ! :+1:

Regarding the details, thanks for your two points :

  1. I’ve changed the calls to check_dims as advised,
  2. I’ve changed the definition of the sample function for the ZINB distribution: the x argument was wrong indeed. I was confused because the definition of the negative binomial distribution in R uses the number of successes (size) and the probability of success (p) as parameters, whereas in tensorflow, the parameters are total_count, that is the number of failures (!), and prob, the probability of success …

I will doublecheck that the parametrisation is consistent between the log_prob and sample functions.

I thought of keeping R's perspective in the first place, but in the end it seems more consistent to retain tf’s, with total_count = size. The documentation for the ZINB should reflect that (I’ve just noticed this is not the case for the negative_binomial distribution which uses total_count = size, and probs = 1 - prob, I’m not sure this is correct as s and f are not symmetric in the definition of the probability distribution).

Regarding the helper test, it seems the likelihoodExplore was not available after a recent update of R. So I’ve changed the test to use the VGAM package instead as it seems more actively maintained.
And added the suggestion as you suggested (with use_package).

I will make sure everything run smoothly and will then a pull request into greta, yes. It will be nice to have the zero-inflated distributions included in the package versions to come ! They should be useful to others as well.

Again, many thanks again for your time and help.

Cheers,

Olivier

Hi, thanks @olivuntu for keeping this thread going again. Having the zero-inflated family of distributions will certainly vastly expand the utility of greta. (and if one is allowed to be greedy, I’ll throw in the Tweedie, or at least it compound Poisson-gamma interpretation, here :grin: )

Anyhow, for the probability density function, is there a reason not the use the package extraDistr which greta already depends on? @olivuntu you may find extraDistr::dzinb helpful too. VGAM is a very nice package, but sometimes it uses alternative parameterisation of the probability density functions so I’m always worried that I am not doing it correctly.

1 Like

Love this discussion! Thanks @hrlai and @olivuntu

@olivuntu let me know how you go with everything re merging in - I might end up changing some things in the pull request, as I’d like to provide a vignette or other documentation on how to add a distribution to greta, so hopefully we can use your work to help demonstrate this :slight_smile:

2 Likes

Hi there,

Yes, motivating ! I was not aware of the Tweedie family, thanks @hrlai, it seems quite powerful ! Thanks also for mentioning extraDistr. Indeed it seems more consistent to use this package for the extra distributions that already parametrized in there ! I’ve changed this in the helper section, but haven’t tested yet.

I’m fine with your input in the final pull step Nick, especially if it ends up in a HOW-TO that helps further improvement !

At the moment, I’m having trouble performing the check() step as it generates a system error on Linux. Otherwise updated the repo, and the package stills installs and loads nicely in R.

Testing further…

All the best,

Olivier

1 Like