Generate samples from a poisson distribution and assign them a value from another one

Bakaburg1 · July 4, 2019, 9:58pm

Hello!

I have some data in which each row is a context, each with a number of people and an average quantity (ie. wating time), recorded in two different years, in twelve months of observation for year one and in 4 months for year 2. I want to simulate the number of people in each context and the waiting time for each person.

I thought about this procedure:

estimate the number of persons for each context with a mixed effect Poisson model with a random intercept at the context level and the year and the months of observation as covariates;
estimate the average waiting time per context, this time with a lognormal regression with the same parameters of (1);
simulate the number of persons I would see for each context in the remaining months of year 2, using the Poisson distribution estimated in (1) and assign them a waiting time according to the context-specific distribution estimated in (2).
use this model to decide how low should be the average of some contexts in order to have global waiting time below a certain threshold.

The problem is that I don’t know how to achieve step 3 in Greta.
Any help?

nick · July 4, 2019, 11:14pm

I’m not sure I exactly understand the model, but it looks like the sticking point is the simulation of discrete counts for the rest of year two. greta can’t model discrete parameters, or simulate them, but it’s quite straightforward to estimate the expected counts (parameter of the Poisson distribution) and then simulate counts based on those in R. E.g.

# fake data, counts for a 12 month period
n <- 10
counts <- rpois(10, 3)

# estimate the monthly rate parameter (you can specify a model for this)
lambda <- lognormal(0, 1, dim = n)
distribution(counts) <- poisson(lambda * 12)

# inference
m <- model(lambda)
draws <- mcmc(m)
coda::gelman.diag(draws)
plot(draws)

# get monthly rate estimates, and simulate observed counts around them for an 8 month period
lambdas <- as.matrix(draws)
counts <- matrix(NA, nrow(lambdas), ncol(lambdas))
counts[] <- rpois(length(lambdas), lambdas * 8)

This gives you 4000 posterior estimates of the 8-month counts in each context, which hopefully is something like what you want.

Bakaburg1 · July 5, 2019, 8:01pm

Thank you very much! I think that thanks to your suggestions the solution is nearer! I’m expecially interested in the line distribution(counts) <- poisson(lambda * 12), I didn’t think about it.

Unfortunately trying your code gives me error “Error in dim(free_state_draws) <- c(1, dim(free_state_draws)): invalid first argument” (I also get it with greta “get_started” demo, I’m having difficulties setting up greta).

To check if I understood, your algorithm draws 4000 lambda estimates for each context, and then with rpois draws 1 count per each lambda/context combination? Is this better than drawing 4000 counts form one lambda per context (estimated by a max. likelihood model)? or than n counts from each 4000 * contexts lambdas?

I’ll give you more info about my problem. Here’s the dataset:
$ Service: "chrc: 13 cases.“Neurology visit”: 63, “Ginecology visit”: 64, "Gastr…
$ Provider: "chrc: 9 cases.“Provider 1”: 101, “Provider 2”: 104, “Provider 3”: 85, "Provid…
$ Class: “chrc: 4 cases.“A”: 203, “B”: 207, “C”: 214, “D”: 167”
$ N.patients: “nmrc: mean: 342 ± 1 041, median: 47 (IQR: 11 - 214)”
$ W.time: “nmrc: mean: 27.1 ± 29.7, median: 18.4 (IQR: 8.9 - 33)”
$ Year: “fctr: 2 cases.“1”: 413, “2”: 378”
$ Months: “nmrc: 2 cases.“4”: 378, “12”: 413”
Context is defined as paste(Service, Provider, Class)

My model should estimate jointly both lambda, for the number of patients, and eta, for the waiting time, using Service, Provider, Class, Months and Year as covariates and context as a random intercept:
lambda (Poisson): N.patients ~ intercept + Service + Provider + Class + Months + Year + (1|context)
eta (lognormal or exponential?): W.time ~ intercept + Service + Provider + Class + Months + Year + (1|context)

As you can see I have two years with a different number of months each. How would I apply the concept in distribution(counts) <- poisson(lambda * 12)? In stanarm or brms I would use predict(), putting Year = 2 and Months = 8.

Then for each predicted count group I would simulate the waiting times with rlnorm(n = count_c.i, meanlog = eta_c.i), with _c.i being a specific draw for a specific context. Is it right? But what where should I get the sdlog argument for rlnorm?

Thanks!

nick · July 23, 2019, 7:38am

To check if I understood, your algorithm draws 4000 lambda estimates for each context, and then with rpois draws 1 count per each lambda/context combination? Is this better than drawing 4000 counts form one lambda per context (estimated by a max. likelihood model)?

This way, it incorporates the uncertainty in lambda as well as the Poisson sampling variation. If you took a single point estimate of lambda, and drew samples from the poisson distribution, it would only have the Poisson sampling distribution, not the uncertainty in (posterior distribution over) lambda.

My model should estimate jointly both lambda, for the number of patients, and eta, for the waiting time, using Service, Provider, Class, Months and Year as covariates and context as a random intercept:
lambda (Poisson): N.patients ~ intercept + Service + Provider + Class + Months + Year + (1|context)
eta (lognormal or exponential?): W.time ~ intercept + Service + Provider + Class + Months + Year + (1|context)

It doesn’t seem like there are any parameters that are shared between the count and waiting time models. so you might as well fit them separately. It’ll give you the same answer as fitting them together. Unless there is some dependence between the two that isn’t shown here?

As you can see I have two years with a different number of months each. How would I apply the concept in distribution(counts) <- poisson(lambda * 12)?

I made lambda the expected count for a single month, So to get the expected count for 12 months, it’s lambda * 12 You can account for variable number of months in both your observations and in whatever predictions you want to make, by multiplying lambda by the appropriate number of months to get the expected count over that period.

Then for each predicted count group I would simulate the waiting times with rlnorm(n = count_c.i, meanlog = eta_c.i), with _c.i being a specific draw for a specific context. Is it right? But what where should I get the sdlog argument for rlnorm?

Sorry, I don’t have a clue what this means. You’re assuming the waiting times are lognormally distributed with parameters given by the counts?