Installing Greta: Robust instructions as of March 31, 2019

ajf · April 12, 2019, 1:06am

Guaranteeing the absence of compatibility issues between tensorflow, tensorflow-probability, greta, reticulate, python, numpy, etc. is quite the challenge. I have two methods that seem robust: 1) for using the CRAN version of greta (recommended for new users) and 2) for using the dev version of greta.

Using CRAN version of greta (currently 0.3.0)

Install miniconda (https://docs.conda.io/en/latest/miniconda.html)
At anaconda command prompt run the following commands (NOTE: The first command is long; it starts with conda create... and ends with numpy=1.15):
- conda create -n r-tensorflow python=3.6 tensorflow=1.11 pyyaml requests Pillow pip numpy=1.15
- conda activate r-tensorflow
- pip install tensorflow-probability==0.4
Open a brand-new R session and use the following commands:
- install.packages("greta")
- library(greta)

Using dev version of greta

At anaconda command prompt run the following commands (NOTE: The first command is long; it starts with conda create... and ends with Pillow)::
- conda create -n r-tensorflow python=3.6 tensorflow=1.13.1 pip pyyaml requests Pillow
- conda activate r-tensorflow
- pip install tensorflow-probability==0.6.0
Open a brand-new R session and use the following commands:
- devtools::install_github("greta-dev/greta")
- library(greta)

Random lessons I’ve learned that might help someone:

Do not use conda-forge for installation. Its version of tensorflow is not optimized with Intel’s libraries and runs slower. Helpful command: conda config --remove channels conda-forge. And refollow above instructions after conda remove -n r-tensorflow --all
Tensor conversion requested dtype int64 ... error is usually due to using dev branch of greta with Tensorflow <= 1.11.
reticulate::py_numpy_available() returns FALSE when trying to use python 3.7. Use python 3.6 for stability.
Using tensorflow-probability=0.5 and tensorflow>=1.12 leads to alot of warning messages about deprecated functions when running greta.
The following lines might help R use the right version of tensorflow if tensorflow or tensorflow-probability is not found:
# start new R session and run the below (otherwise R will use wrong info)
# make sure the path below is changed (i.e. "<USERNAME>")
reticulate::use_condaenv("r-tensorflow")
reticulate::use_python("c:/Users/<USERNAME>/Miniconda3/envs/r-tensorflow/", required = TRUE)
reticulate::py_numpy_available(initialize = TRUE)
# Use the below to check that conda and python are sourced from the right places
reticulate::py_config()
Do not have R/RStudio open when changing the Python environment via conda. This can lead to permission errors.

nick · March 31, 2019, 10:00pm

This is really helpful, thank you!

I’m still trying to decide whether to try and wrap more of this up in R helper functions within the package. It would lower the barrier for new users, but seems like a constantly shifting target as the python dependencies change all the time.

dirmeier · April 1, 2019, 2:42pm

Hey, that’s extremely helpful. Do you happen to have any benchmarks on that? Giving up conda-forge seems like a huge investment if the speedup from Intel’s libraries is only marginal.
Cheers,
S

ajf · April 1, 2019, 4:16pm

My benchmarking is far from rigorous, but I did test this one basic model multiple dozens of times with different python package recipes:

library(greta)

coinflip = rbinom(n = 1000,size = 1,prob = 0.6)  #Simulate some data

## The Greta Model
y <- as_data(coinflip)   #DATA
theta  <- beta(shape1 = 2, shape2 = 2)   #PRIOR
distribution(y) <- bernoulli(prob = theta)   #LIKELIHOOD
gretaModel <- model(theta)   #MODEL
system.time(draws <- mcmc(gretaModel))   #POSTERIOR

On my laptop (MS Surface Book 2 - Intel i7 - four physical cores), using the above recipe, the system time registers between 20-23 seconds. If conda-forge is used to download tensorflow, then the time doubles to 40-45 seconds. To me, that is quite significant.

I do not mean to have such strong conda-forge opinions. After all, they are the only conda channel that has tensorflow-probability. Unfortunately, they are not up to tensorflow-probability=0.6.0 and tensorflow-probability=0.5.0 is much less than ideal for both the CRAN greta version and the dev version at this point in time. Instead of removing the conda-forge channel, just adding -c anaconda to the tensorflow install command will ensure the right Intel-optimized tensorflow is downloaded without getting rid of the conda-forge channel.

PS: I once had a configuration running in the 17 second range, but who knows how I got it (I was playing with .whl files to take advantage of AVX2) and I have since been unable to repeat it.

PSS: It would be great if others can post their times for the above model with their configurations. My desktop (Intel i7-4790 - 4 core - no gpu) is running these versions:

numpy                     1.15.3           py36ha559c80_0
numpy-base                1.15.3           py36h8128ebf_0
tensorflow                1.10.0                   py36_0    conda-forge
tensorflow-hub            0.1.1                      py_0    conda-forge
tensorflow-probability    0.4.0                     <pip>

and takes about 20 seconds elapsed. I have been scared to update the versions as I am writing a book on my desktop and do not want to screw things up.

PSSS: To ensure parity of benchmarks, on all my machines, the above code yields 4,000 posterior draws.

ajf · April 1, 2019, 6:17pm

Nick, thanks again for such a cool package, I am a big fan of Greta. I am sure there are some people who get discouraged by the install process, so it would be nice to have helper functions. Maybe just initially, updating greta-stats “getting started” with some of this info could be a good start. And then, maybe helper functions that are very explicit about which python package versions could accompany CRAN Greta releases? The biggest reason that CRAN Greta has install troubles is that install_tensorflow is now on v1.13 as opposed to v1.11. If you lock down the versions of tensorflow, tensorflow-probability, numpy, and Python that should accompany a CRAN Greta release, then tensorflow can keep moving forward without breaking the CRAN Greta installation process.

dirmeier · April 2, 2019, 8:24am

I see, that’s good to know. Thanks!

adibender · April 4, 2019, 10:16am

Thanks, really helpful instructions. However, I have to rerun the above code everytime I restart R. Anybody else experiencing this?

ajf · April 8, 2019, 5:09pm

This might help you debug this need to give a hint about which environment to use each time: https://rstudio.github.io/reticulate/articles/versions.html

If Greta is your only need for Python, you might consider getting rid of all the other environments by uninstalling python/anaconda and doing a fresh install.

Zhi · April 11, 2019, 11:29pm

Wow, this post is super helpful. I was able to finally install greta! The installation process also wrecked me. I am a JAGS user and tried Stan today, which didn’t work out for me. Now I am trying to use greta but almost gave up at the installation. Thank you!