Thinning "speeds up" sampling, and sampling slows down over time

rdinnager · January 18, 2019, 1:29am

Here I am going to report what appears to be strange behaviour in my greta models, at least as far as I understand things. I have found these issues in every model that I have been using personally, but I found that they also affect one of the examples from the greta website (I’ve only tried this one, so don’t know about the others). This is the example:

n_env <- 6
n_sites <- 100
## n_sites x n_env matrix of environmental variables

env <- matrix(rnorm(n_sites * n_env), nrow = n_sites)

## n_sites observations of species presence or absence

occupancy <- rbinom(n_sites, 1, 0.5)
alpha <- normal(0, 10)

beta <- normal(0, 10, dim = n_env)
## logit-linear model

linear_predictor <- alpha + env %*% beta

p <- ilogit(linear_predictor)
## distribution (likelihood) over observed values

distribution(occupancy) <- bernoulli§
mod <- model(alpha, beta)
test_mod <- mcmc(mod, warmup = 1000, n_samples = 10, thin = 1)

## 
## running 4 chains simultaneously on up to 12 cores

## 
    warmup                                           0/1000 | eta:  ?s          
    warmup ==                                       50/1000 | eta: 14s          
    warmup ====                                    100/1000 | eta: 10s          
    warmup ======                                  150/1000 | eta:  8s          
    warmup ========                                200/1000 | eta:  7s          
    warmup ==========                              250/1000 | eta:  6s          
    warmup ===========                             300/1000 | eta:  6s          
    warmup =============                           350/1000 | eta:  5s          
    warmup ===============                         400/1000 | eta:  5s          
    warmup =================                       450/1000 | eta:  4s          
    warmup ===================                     500/1000 | eta:  4s          
    warmup =====================                   550/1000 | eta:  3s          
    warmup =======================                 600/1000 | eta:  3s          
    warmup =========================               650/1000 | eta:  2s          
    warmup ===========================             700/1000 | eta:  2s          
    warmup ============================            750/1000 | eta:  2s          
    warmup ==============================          800/1000 | eta:  1s          
    warmup ================================        850/1000 | eta:  1s          
    warmup ==================================      900/1000 | eta:  1s          
    warmup ====================================    950/1000 | eta:  0s          
    warmup ====================================== 1000/1000 | eta:  0s          
## 
  sampling                                             0/10 | eta:  ?s

I have discovered a number of issues, which all started when I noticed that the progress bar on my models always had a highly inaccurate time estimate when they started, that suggested the sampling was slowing down over time. Indeed, this was what was happening. First of all, I will give everyone my session info for context (I really hope this is just a versioning issue!):

tensorflow::tf_version()

## [1] '1.12'

sessionInfo()

## R version 3.5.0 (2018-04-23)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] greta_0.3.0.9001
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0        whisker_0.3-2     knitr_1.20       
##  [4] magrittr_1.5      hms_0.4.1         progress_1.2.0   
##  [7] lattice_0.20-35   R6_2.3.0          rlang_0.3.1      
## [10] stringr_1.3.1     globals_0.12.4    tools_3.5.0      
## [13] parallel_3.5.0    grid_3.5.0        coda_0.19-2      
## [16] tfruns_1.4        htmltools_0.3.6   yaml_2.1.19      
## [19] digest_0.6.18     assertthat_0.2.0  crayon_1.3.4     
## [22] tensorflow_1.10   Matrix_1.2-14     base64enc_0.1-3  
## [25] codetools_0.2-15  evaluate_0.10     rmarkdown_1.11   
## [28] stringi_1.2.4     compiler_3.5.0    prettyunits_1.0.2
## [31] jsonlite_1.5      reticulate_1.10   future_1.10.0    
## [34] listenv_0.7.0     pkgconfig_2.0.1

reticulate::py_config()

## python:         C:\Users\u1013287\AppData\Local\Continuum\miniconda3\envs\r-tensorflow\python.exe
## libpython:      C:/Users/u1013287/AppData/Local/Continuum/miniconda3/envs/r-tensorflow/python36.dll
## pythonhome:     C:\Users\u1013287\AppData\Local\Continuum\miniconda3\envs\r-tensorflow
## version:        3.6.7 |Anaconda, Inc.| (default, Dec 10 2018, 20:35:02) [MSC v.1915 64 bit (AMD64)]
## Architecture:   64bit
## numpy:          C:\Users\u1013287\AppData\Local\Continuum\miniconda3\envs\r-tensorflow\lib\site-packages\numpy
## numpy_version:  1.15.4
## tensorflow:     C:\Users\u1013287\AppData\Local\Continuum\miniconda3\envs\r-tensorflow\lib\site-packages\tensorflow\__init__.p
## 
## python versions found: 
##  C:\Users\u1013287\AppData\Local\Continuum\miniconda3\envs\r-tensorflow\python.exe
##  C:\Users\u1013287\AppData\Local\Continuum\miniconda3\python.exe

First I will check the tensorflow graph size of the model, which we will use again later.

reticulate::py_len(attr(test_mod, "model_info")$model$dag$tf_graph$as_graph_def()$node)

## [1] 1482

The most simple odd thing I’ve noticed is the computational time effects of thinning. My understanding is that the MCMC chain should always sample n_samples, but then thinning just throws away a certain number of those samples. This would suggest to me that thinning should not affect the computational time of a chain (accept for the hopefully minimal overhead of deleting the samples and/or transferring the samples between tensorflow and R). However, when I use different thinning levels, the computation time is roughly linear in the number of “kept” samples and adds a fairly substantial amount of time (for large models it is massively slower to not thin):

n_kept <- c(2, 4, 5, 10, 20, 50, 100)
n_thin <- as.integer(100 / n_kept)
timing <- list()

for(i in seq_along(n_kept)){

timing[[i]] <- system.time(extra_samples(test_mod, n_samples = 100, thin = n_thin[i], verbose = FALSE))

print(i)

}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7

user_time <- sapply(timing, function(x) x[1])
plot(user_time ~ n_kept)
abline(lm(user_time ~ n_kept))

Is this a bug? It certaintly doesn’t seem right to me.

Anyway, I started by saying that I noticed originally that my models seemed to be slowing down over time. Again, my understanding of MCMC is that computation time for sampling should more or less be constant. I think this is something happening in tensorflow, because when I make entirely new models in greta they are also slowerif I haven’t reset my R session after running an “old” model. Here is an example. All I am doing is just repeatedly running mcmc on the same model. Each time it gets slower. I also output some information from the tensorflow session: the graph size, and the “sample_batch”.

timing <- list()
node_num <- list()
sampler_batch <- list()
for(i in 1:50) {
timing[[i]] <- system.time(

test_mod <- mcmc(mod, warmup = 0L, thin = 50, n_samples = 500, verbose = FALSE)

)

## save graph size

node_num[[i]] <- reticulate::py_len(attr(test_mod, "model_info")$model$dag$tf_graph$as_graph_def()$node)

## save "sample_batch" tensors

sampler_batch[[i]] <- attr(test_mod, "model_info")$samplers[[1]]$model$dag$tf_environment$sampler_batch
}
user_time <- sapply(timing, function(x) x[1])

plot(user_time, type = "l")

abline(lm(user_time~seq(1, length(timing))))

If we examine the graph sizes, they increase everytime mcmc has been run, which is weird.

unlist(node_num)

##  [1]  2825  4168  5511  6854  8197  9540 10883 12226 13569 14912 16255
## [12] 17598 18941 20284 21627 22970 24313 25656 26999 28342 29685 31028
## [23] 32371 33714 35057 36400 37743 39086 40429 41772 43115 44458 45801
## [34] 47144 48487 49830 51173 52516 53859 55202 56545 57888 59231 60574
## [45] 61917 63260 64603 65946 67289 68632

plot(unlist(node_num))

lapply(sampler_batch, function(x) x[[1]])

## [[1]]
## Tensor("mcmc_sample_chain_1/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[2]]
## Tensor("mcmc_sample_chain_2/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[3]]
## Tensor("mcmc_sample_chain_3/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[4]]
## Tensor("mcmc_sample_chain_4/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[5]]
## Tensor("mcmc_sample_chain_5/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[6]]
## Tensor("mcmc_sample_chain_6/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[7]]
## Tensor("mcmc_sample_chain_7/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[8]]
## Tensor("mcmc_sample_chain_8/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[9]]
## Tensor("mcmc_sample_chain_9/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[10]]
## Tensor("mcmc_sample_chain_10/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[11]]
## Tensor("mcmc_sample_chain_11/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[12]]
## Tensor("mcmc_sample_chain_12/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[13]]
## Tensor("mcmc_sample_chain_13/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[14]]
## Tensor("mcmc_sample_chain_14/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[15]]
## Tensor("mcmc_sample_chain_15/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[16]]
## Tensor("mcmc_sample_chain_16/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[17]]
## Tensor("mcmc_sample_chain_17/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[18]]
## Tensor("mcmc_sample_chain_18/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[19]]
## Tensor("mcmc_sample_chain_19/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[20]]
## Tensor("mcmc_sample_chain_20/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[21]]
## Tensor("mcmc_sample_chain_21/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[22]]
## Tensor("mcmc_sample_chain_22/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[23]]
## Tensor("mcmc_sample_chain_23/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[24]]
## Tensor("mcmc_sample_chain_24/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[25]]
## Tensor("mcmc_sample_chain_25/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[26]]
## Tensor("mcmc_sample_chain_26/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[27]]
## Tensor("mcmc_sample_chain_27/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[28]]
## Tensor("mcmc_sample_chain_28/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[29]]
## Tensor("mcmc_sample_chain_29/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[30]]
## Tensor("mcmc_sample_chain_30/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[31]]
## Tensor("mcmc_sample_chain_31/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[32]]
## Tensor("mcmc_sample_chain_32/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[33]]
## Tensor("mcmc_sample_chain_33/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[34]]
## Tensor("mcmc_sample_chain_34/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[35]]
## Tensor("mcmc_sample_chain_35/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[36]]
## Tensor("mcmc_sample_chain_36/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[37]]
## Tensor("mcmc_sample_chain_37/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[38]]
## Tensor("mcmc_sample_chain_38/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[39]]
## Tensor("mcmc_sample_chain_39/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[40]]
## Tensor("mcmc_sample_chain_40/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[41]]
## Tensor("mcmc_sample_chain_41/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[42]]
## Tensor("mcmc_sample_chain_42/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[43]]
## Tensor("mcmc_sample_chain_43/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[44]]
## Tensor("mcmc_sample_chain_44/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[45]]
## Tensor("mcmc_sample_chain_45/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[46]]
## Tensor("mcmc_sample_chain_46/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[47]]
## Tensor("mcmc_sample_chain_47/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[48]]
## Tensor("mcmc_sample_chain_48/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[49]]
## Tensor("mcmc_sample_chain_49/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)
## 
## [[50]]
## Tensor("mcmc_sample_chain_50/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)

Also, printing the “sample_batch” tensors, we can see that the name of the tensor, “mcmc_sample_chain_i”, has been iterated by 1 with each run of mcmc. I don’t know enough about tensorflow to know what this means exactly but perhaps it could help with diagnosing the problem?

Now, I originally noted the slowdown when using extra_samples, not by repeatedly calling mcmc, so what happens when we use extra_samples instead? Well, for one thing, the graph size no longer seems to increase.

timing <- list()
node_num <- list()
sampler_batch <- list()
for(i in 1:100) {
timing[[i]] <- system.time(

test_mod <- extra_samples(test_mod, thin = 50, n_samples = 500, verbose = FALSE)

)
}
user_time <- sapply(timing, function(x) x[1])

plot(user_time, type = "l")

abline(lm(user_time~seq(1, length(timing))))

Confirm graph size has not changed with extra_samples:

reticulate::py_len(attr(test_mod, "model_info")$model$dag$tf_graph$as_graph_def()$node)

## [1] 68632

attr(test_mod, "model_info")$samplers[[1]]$model$dag$tf_environment$sampler_batch[[1]]

## Tensor("mcmc_sample_chain_50/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)

It appears that the computation time does not increase with extra_samples, however, I have noticed that the sampling does take longer over time with extra_samples for very large models, though not as steeply as with mcmc. However, with large models like I normally work with, this slow linear increase can make a really big difference, to the point where the sampling slows to a near stop over time. I can’t seem to reproduce that behaviour with this example, though, so maybe I will try posting another example later.

Despite all this weirdness, greta does a fine job of coverging in this model and mixes well:

library(bayesplot)

## Warning: package 'bayesplot' was built under R version 3.5.1

## This is bayesplot version 1.6.0

## - Online documentation and vignettes at mc-stan.org/bayesplot

## - bayesplot theme set to bayesplot::theme_default()

##    * Does _not_ affect other ggplot2 plots

##    * See ?bayesplot_theme_set for details on theme setting

mcmc_trace(test_mod, c("alpha", "beta[1,1]"))

It turns out I can get the graph size back to normal by “resetting” the tensorflow session:

attr(test_mod, "model_info")$model$dag$new_tf_environment()
attr(test_mod, "model_info")$model$dag$define_tf()
test_mod <- mcmc(mod, warmup = 1000, n_samples = 10, thin = 1)

## 
## running 4 chains simultaneously on up to 12 cores

## 
    warmup                                           0/1000 | eta:  ?s          
    warmup ==                                       50/1000 | eta: 15s          
    warmup ====                                    100/1000 | eta: 10s          
    warmup ======                                  150/1000 | eta:  8s          
    warmup ========                                200/1000 | eta:  7s          
    warmup ==========                              250/1000 | eta:  6s          
    warmup ===========                             300/1000 | eta:  6s          
    warmup =============                           350/1000 | eta:  5s          
    warmup ===============                         400/1000 | eta:  5s          
    warmup =================                       450/1000 | eta:  4s          
    warmup ===================                     500/1000 | eta:  4s          
    warmup =====================                   550/1000 | eta:  3s          
    warmup =======================                 600/1000 | eta:  3s          
    warmup =========================               650/1000 | eta:  3s          
    warmup ===========================             700/1000 | eta:  2s          
    warmup ============================            750/1000 | eta:  2s          
    warmup ==============================          800/1000 | eta:  1s          
    warmup ================================        850/1000 | eta:  1s          
    warmup ==================================      900/1000 | eta:  1s          
    warmup ====================================    950/1000 | eta:  0s          
    warmup ====================================== 1000/1000 | eta:  0s          
## 
  sampling                                             0/10 | eta:  ?s

## confirm graph size is back to original
reticulate::py_len(attr(test_mod, "model_info")$model$dag$tf_graph$as_graph_def()$node)

## [1] 1482

attr(test_mod, "model_info")$samplers[[1]]$model$dag$tf_environment$sampler_batch[[1]]

## Tensor("mcmc_sample_chain/scan/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, ?, 7), dtype=float64)

But obviously resetting the model like this will mess up sampling using extra_samples because it clears all of the information needed to continue the chain (not to mention losing the tuning that has occurred previously).

So, anyway, does anyone here have an idea what is going on here? Am I correct to think this behaviour is not what is expected? As I mentioned before the slowdown makes a huge difference in large models to the point where greta is unusable for me at this point. Hopefully I’m just missing something simple…

Thanks in advance for any help you can give!

Voltemand · January 19, 2019, 3:31am

Hi rdinnager,

Thanks for documenting this bug - it’s very odd and intriguing.

It’s well over my head though so we’re going to have to page @nick and see what he thinks.

Thanks!

thorvall · March 24, 2019, 5:32pm

Hi @rdinnager.
I have observed the same behavior across different models, but have not documented as you have.
@nick, @Voltemand: I have a suspicion that this might be related to garbage collection on the python side. Cause when I try running some models with ‘slice’ or a higher than standard Lmin/Lmax for hmc i do tend to run into memory overflow errors, or when trimming to model (steps, variables, etc) I end up with a heavy use of memory, which gc() does not release. This also happens when i do an lapply on ‘opt’ to generate multiple estimates.
And this could for smaller models have an impact on speed since memory would become more fragmented.
If you believe that the two effect are not linked, do you have any advice as how to implement code to circumvent the garbage collection challenge.

Kind regards.