I’m trying to implement a spatial capture-recapture model in greta, but I think I must be doing something wrong because it is taking a very long time to compile and run, and I haven’t even tried the full model yet. Is there some way to improve the code below?
Simulate a dataset:
## Simulate activity centers for N=100 individuals
N <- 100
xlim <- ylim <- c(0,1)
s <- cbind(runif(N, xlim[1], xlim[2]),
runif(N, ylim[1], ylim[2]))
## Trap locations
x0 <- seq(0.25, 0.75, length=7)
x <- cbind(rep(x0, each=7), rep(x0, times=7))
J <- nrow(x) ## nTraps
plot(s, xlim=c(0,1), ylim=c(0,1))
points(x, pch=3)
## Parameters of encounter model
lam0 <- 0.3
sigma <- 0.1
K <- 5 ## nOccasions
lambda <- dist <- matrix(NA, N, J)
## Encounter histories
yall <- array(NA, c(N, J)) ## Shout-out from Georgia
for(j in 1:J) {
dist[,j] <- sqrt((s[,1]-x[j,1])^2 +
(s[,2]-x[j,2])^2)
lambda[,j] <- lam0*exp(-dist[,j]^2 / (2*sigma^2))
yall[,j] <- rpois(N, lambda[,j]*K)
}
## Discard data on individuals not detected
detected <- as.integer(rowSums(yall)>0)
(n <- sum(detected))
y <- yall[detected,]
Clean up and load greta (hopefully with 16 cores enabled):
## stash objects that aren't needed to fit the model
stashenv <- new.env()
obj.list <- sapply(ls(), get)
list2env(obj.list, envir=stashenv)
keep.obj <- c("y", "n", "J", "K", "x", "xlim", "ylim", "detected")
rm(list=ls()[!ls() %in% keep.obj])
ls()
library(parallel)
getOption("mc.cores")
options("mc.cores"=16)
library(greta)
Specify model:
lam0 <- uniform(0, 3)
sigma <- uniform(0, 2)
psi <- beta(1,1)
## Model for the captured individuals (will deal with N later)
s.obs <- greta_array(dim=c(n, 2))
for(i in 1:n) {
## s.obs[i,1:2] <- joint(uniform(xlim[1], xlim[2]),
## uniform(ylim[1], ylim[2]))
s.obs[i,1] <- uniform(xlim[1], xlim[2])
s.obs[i,2] <- uniform(ylim[1], ylim[2])
}
d2.obs <- zeros(n, J)
lambda.obs <- zeros(n, J)
for(j in 1:J) {
d2.obs[,j] <- (s.obs[,1]-x[j,1])^2 + (s.obs[,2]-x[j,2])^2
lambda.obs[,j] <- lam0*exp(-d2.obs[,j]/(2*sigma^2))*K
}
distribution(y) <- poisson(lambda.obs)
Compile and sample:
system.time({
gm <- model(lam0, sigma)
})
## CRAN version
## user system elapsed
## 601.162 360.196 180.352
## github version
## user system elapsed
## 1228.589 720.429 280.871
system.time({
mc <- mcmc(gm, n_samples=1000, warmup=200, chains=1, n_cores=16)
}) ## This has been running for >1 hr with no sign of a progress bar yet
Session info:
> detectCores()
[1] 32
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 19.10
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.7.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] greta_0.3.1.9010
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 lattice_0.20-38 codetools_0.2-16 listenv_0.8.0
[5] prettyunits_1.1.1 future_1.16.0 digest_0.6.23 crayon_1.3.4
[9] rappdirs_0.3.1 grid_3.6.1 R6_2.4.1 jsonlite_1.6
[13] magrittr_1.5 coda_0.19-3 tfruns_1.4 rlang_0.4.4
[17] progress_1.2.2 whisker_0.4 reticulate_1.14 vctrs_0.2.2
[21] tools_3.6.1 hms_0.5.3 compiler_3.6.1 base64enc_0.1-3
[25] pkgconfig_2.0.3 tensorflow_2.0.0 globals_0.12.5
For some reason, it looks like only 1 core is being used.
Eventually, I will need to add a lot of complexity to this model, so if it is this slow now, I’m afraid it won’t be viable for the full SCR model, which will involve data augmentation and then marginalization of the indicator variable z out of the likelihood… assuming that it still isn’t possible to sample discrete variables in greta.
Any ideas for improvement? Thanks for the great software!