Generate samples from a poisson distribution and assign them a value from another one



I have some data in which each row is a context, each with a number of people and an average quantity (ie. wating time), recorded in two different years, in twelve months of observation for year one and in 4 months for year 2. I want to simulate the number of people in each context and the waiting time for each person.

I thought about this procedure:

  1. estimate the number of persons for each context with a mixed effect Poisson model with a random intercept at the context level and the year and the months of observation as covariates;
  2. estimate the average waiting time per context, this time with a lognormal regression with the same parameters of (1);
  3. simulate the number of persons I would see for each context in the remaining months of year 2, using the Poisson distribution estimated in (1) and assign them a waiting time according to the context-specific distribution estimated in (2).
  4. use this model to decide how low should be the average of some contexts in order to have global waiting time below a certain threshold.

The problem is that I don’t know how to achieve step 3 in Greta.
Any help?


I’m not sure I exactly understand the model, but it looks like the sticking point is the simulation of discrete counts for the rest of year two. greta can’t model discrete parameters, or simulate them, but it’s quite straightforward to estimate the expected counts (parameter of the Poisson distribution) and then simulate counts based on those in R. E.g.

# fake data, counts for a 12 month period
n <- 10
counts <- rpois(10, 3)

# estimate the monthly rate parameter (you can specify a model for this)
lambda <- lognormal(0, 1, dim = n)
distribution(counts) <- poisson(lambda * 12)

# inference
m <- model(lambda)
draws <- mcmc(m)

# get monthly rate estimates, and simulate observed counts around them for an 8 month period
lambdas <- as.matrix(draws)
counts <- matrix(NA, nrow(lambdas), ncol(lambdas))
counts[] <- rpois(length(lambdas), lambdas * 8)

This gives you 4000 posterior estimates of the 8-month counts in each context, which hopefully is something like what you want.


Thank you very much! I think that thanks to your suggestions the solution is nearer! I’m expecially interested in the line distribution(counts) <- poisson(lambda * 12), I didn’t think about it.

Unfortunately trying your code gives me error “Error in dim(free_state_draws) <- c(1, dim(free_state_draws)): invalid first argument” (I also get it with greta “get_started” demo, I’m having difficulties setting up greta).

To check if I understood, your algorithm draws 4000 lambda estimates for each context, and then with rpois draws 1 count per each lambda/context combination? Is this better than drawing 4000 counts form one lambda per context (estimated by a max. likelihood model)? or than n counts from each 4000 * contexts lambdas?

I’ll give you more info about my problem. Here’s the dataset:
$ Service: "chrc: 13 cases.“Neurology visit”: 63, “Ginecology visit”: 64, "Gastr…
$ Provider: "chrc: 9 cases.“Provider 1”: 101, “Provider 2”: 104, “Provider 3”: 85, "Provid…
$ Class: “chrc: 4 cases.“A”: 203, “B”: 207, “C”: 214, “D”: 167”
$ N.patients: “nmrc: mean: 342 ± 1 041, median: 47 (IQR: 11 - 214)”
$ W.time: “nmrc: mean: 27.1 ± 29.7, median: 18.4 (IQR: 8.9 - 33)”
$ Year: “fctr: 2 cases.“1”: 413, “2”: 378”
$ Months: “nmrc: 2 cases.“4”: 378, “12”: 413”
Context is defined as paste(Service, Provider, Class)

My model should estimate jointly both lambda, for the number of patients, and eta, for the waiting time, using Service, Provider, Class, Months and Year as covariates and context as a random intercept:
lambda (Poisson): N.patients ~ intercept + Service + Provider + Class + Months + Year + (1|context)
eta (lognormal or exponential?): W.time ~ intercept + Service + Provider + Class + Months + Year + (1|context)

As you can see I have two years with a different number of months each. How would I apply the concept in distribution(counts) <- poisson(lambda * 12)? In stanarm or brms I would use predict(), putting Year = 2 and Months = 8.

Then for each predicted count group I would simulate the waiting times with rlnorm(n = count_c.i, meanlog = eta_c.i), with _c.i being a specific draw for a specific context. Is it right? But what where should I get the sdlog argument for rlnorm?