Install and Configure greta on a Windows 10 laptop

sreedatta · August 11, 2021, 10:38pm

Hello All - these instructions are for those of you who are on a Windows 10 laptop with Anaconda-3 already installed (you cannot run Anaconda3 and MiniConda on the same machine). With @ajf help I was able to figure out how to connect the new environment in Anaconda3 created for greta with the R package reticulate and greta. Please follow the instructions verbatim, with changes needed as specific to your laptop and installed directories etc. Please note the assumptions stated at the beginning and hopefully you are not installing these tools in mission critical laptops and servers. Hopefully you are playing with these tools on your personal laptop first!!! The post is long as I have prepared it for intermediate level users and not programmers, so please be patient. It took me 3 days to get things working correctly.

ASSUMPTION: Either the user or the person who is installing is assumed to have Admin rights on your Windows 10 laptop. There may be ways to do the installation without the Admin rights, but that is a completely different setup including the set-up for Anaconda3 and R

ASSUMPTION: If you have Admin rights then it is also assumed that you have some familiarity with Folder Operations, setting System variables, and familiarity with executing commands using the Command prompt in standard Windows and the prompt in Anaconda3.

ASSUMPTION: Assumes that Anaconda3 and R-4.1.0 (both 64-bit) have been installed on your Windows 10 laptop.

ASSUMPTION: If you follow the instructions provided precisely, they most likely should work. However, please note that with Open-source projects and tools, you are using these tools with a known risk that when one tool works, another might break (this happens with “rstan” package, where setting the optimization level for C++ code can hinder the compiler when it comes other packages that do not use the more complex optimization that “rstan” uses). This posting is an attempt to help other users with what I have learned. However, you assume all the risk associated with anything that functions differently or breaks after going through these steps. DO NOT ever install such exploratory tools in laptops and servers that are mission critical for your day-to-day job.

You have to follow the following steps to exactly and create a conda environment in Anaconda3, 64-bit and then install reticulate, greta, tensorflow, causact, DiagrammeR, and bayesplot in R and then configure reticulate correctly so that the conda environment created for greta can be accessed correctly by R to run .

STEP1: Launch the Base Anaconda3 Command prompt from the Windows Start Menu

(base) C:\Users\your_user_name>  # Base Anaconda3 Command Prompt on my laptop ```

STEP2: Then you type the following command exactly as shown below (note the **-y** at the end, include it as well):

(base) C:\Users\your_user_name>conda create -n gretf -c conda-forge “python=3.7” libpython mkl-service m2w64-toolchain tensorflow=1.14 numpy=1.21 pyyaml requests Pillow pip h5py=2.8 tensorflow-probability=0.7 -y **Note** - in the instructions given by @atj it is specified that we should use “numpy=1.16”. In my first installation, I got a message that the numpy version installed numpy=1.16.6, was broken. So in my second attempt I allowed the default version of Conda’s choice. That turned out to be numpy=1.21.1 – and this worked correctly. **Note** - I chose to name the environment **gretf** as a combination of **gre**ta & **tf** for TensorFlow. You can choose any reasonable name that you want (see Anaconda3 documentation for naming conventions). If you are a user who wants this setup to be a no-hassle setup then simply stick to “gretf”. Later there are commands that use this name. If you have typed the command correctly then you should see a list of Python packages (there will be more than what you specified as there are a lot of dependencies that will be installed) and then it will ask youProceed ([y]/n)?``` - You type “y” (without the quotes)
After all the packages are downloaded and installed, you should see something as shown below

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate gretf
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) C:\Users\your_user_name>```

**Note** - Do not activate your new environment. – Close out of the Anaconda3 prompt and if you have R open, close it and Launch R again.

STEP3: Launch R or R-Studio (in either tool make sure, you save your commands in a script so you have a record). I’m going to the R and version of R is 4.1.0 and I’m using the 64-bit version of R. Follow the sequence exactly as I have noted below:

a.	In the File menu, click on the “New Script” and save the script as “Install_and_Configure_greta.R”
b.	First type the following commands in the exact same order in new script window:

install.packages("reticulate")
Sys.setenv(RETICULATE_PYTHON = 'C:/ProgramData/Anaconda3/envs/gretf')
library(reticulate)
reticulate::py_config()
install.packages("gretf")
install.packages("tensorflow")
install.packages(“DiagrammeR”)
install.packages(“bayesplot”)
install.packages("causact"))
	
c.	Highlight the first “install.packages(“reticulate”) and click on the 3rd icon with the arrow  

d.	You should see a window pop-up asking you to select a repository to download the packages from (Select whatever you usually choose – I usually work with the CRAN Repository and my mirror is set to 0-Cloud[https])

e.	Ensure that “reticulate” is installed without any error

f.	Next highlight the entire command line starting with “Sys.set……..”) – PLEASE MAKE SURE you have changed the Environment name “gretf” [I use it on my machine]. If you have used a different name, then put that name in place of “gretf”. Click the 3rd icon with the arrow and ensure there is no error shown at the R Console. With this command, we are telling the “reticulate” package, the correct version of “Python” (including the environment it should be accessing, in this case “gretf”)  - NOTE – the Sys.setenv() command and the change made is EFFECTIVE only for the CURREN R session. If you close R and then try to launch R again, type the command “Sys.getenv()” – you will see that there is no variable such as RETICULATE_PYTHON. I will describe how to set it permanently later on in the post.

g.	If there is an error, please check the text you have typed mirrors what I have provided

h.	Next highlight the entire command line starting with “reticulate::py_config() [ py_config() is a function within the “reticulate” package. Since we ran the Sys.setenv() where we identified the Python environment “reticulate” should use, this command will test if the Sys.setenv() worked correctly. If it did you should see an output like below:

python: C:/ProgramData/Anaconda3/envs/gretf/python.exe
libpython: C:/ProgramData/Anaconda3/envs/gretf/python37.dll
pythonhome: C:/ProgramData/Anaconda3/envs/gretf
version: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:37:01) [MSC v.1916 64 bit (AMD64)]
Architecture: 64bit
numpy: C:/ProgramData/Anaconda3/envs/gretf/Lib/site-packages/numpy
numpy_version: 1.21.1

NOTE: Python version was forced by RETICULATE_PYTHON


If any of those are empty, then there is a problem with the Environment you have created and the commands you ran in R. Error can occur in both places. You should go back to STEP 1.
You notice at the end, the “NOTE: Python version was forced by RETICULATE_PYTHON” – this is a result of running the Sys.setenv command (sub-step f)

i.	Assuming everything so far has ran correctly you can highlight and run the next four  install.packages() commands in one go. Highlight the four lines and click on the 3rd icon with the arrow.
j.	If the installation proceeds without errors, Congratulations!!! You are now ready to test the “greta” installation with the code below:

# Testing the installation of "greta" in R – copy and paste the code from #START to #END below into a new # R script called “Testing_greta.R”

# START
# Load the libraries. “reticulate” is already loaded
library(great)
library(tensorflow)
library(causact)
library(DiagrammeR)
library(bayesplot)

# “iris” is a dataset that comes with every installation of Base R
data(iris)

# data – in this step we are identifying the Independent (x) and the Dependent (y) variables
# the as_data() function converts variable(s) into “greta” arrays that can be read in by functions within #“greta” library
x <- as_data(iris$Petal.Length)
y <- as_data(iris$Sepal.Length)

# setting the model variables and the priors for each model component
# int = intercept; coef = beta coefficient; sd = standard deviation (note “sd” declaration has a truncation # = c(0, Inf) – recognizing the fact that standard deviation & variance CANNOT be NEGATIVE
int <- normal(0, 1)
coef <- normal(0, 3)
sd <- student(3, 0, 1, truncation = c(0, Inf))

# operations to calculate the “predicted mean” of the Dependent variable (y = Sepal.Length)
mean <- int + coef * x

# likelihood – simulating values of “y”, using the “predicted mean” and “sd”
distribution(y) <- normal(mean, sd)

# defining the model
m <- model(int, coef, sd)

# plotting the model, the DAG (Directed Acyclic Graph) for the specified model using “great” and the       # “DiagrammeR” packages, This is the plot of the model specified. This is NOT the Posterior.
plot(m)

# sampling with 1000 iterations
draws <- mcmc(m, n_samples = 1000)
# calculates the point estimates and HPD intervals
summary(draws)
# draws the trace of how values for Intercept, Coefficient, and Standard Deviation changed
# we want to see trace plots that look like “Caterpillars” indicating very good exploration of the sample   # space
mcmc_trace(draws)
# plots the HPD intervals for the “int”, “coef”, and “sd”
mcmc_intervals(draws)

# END

k.	Run the commands in a block as physically separated by white space. For example, run all the commands beginning with “library(…….)” in one set. I recommend doing this way so it is easy to identify the error if one occurs.

Your overall output without the graphs should look as follows in the R Console:

data(iris)

data

x <- as_data(iris$Petal.Length)
y <- as_data(iris$Sepal.Length)

variables and priors

int <- normal(0, 1)
coef <- normal(0, 3)
sd <- student(3, 0, 1, truncation = c(0, Inf))

operations

mean <- int + coef * x

likelihood

distribution(y) <- normal(mean, sd)

defining the model

m <- model(int, coef, sd)

plotting

plot(m)

sampling

draws <- mcmc(m, n_samples = 1000)
WARNING:tensorflow:From C:\PROGRA~3\ANACON~1\envs\gretf\lib\site-packages\tensorflow_probability\python\distributions\student_t.py:272: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

running 4 chains simultaneously on up to 8 cores

 warmup                                           0/1000 | eta:  ?s               warmup ==                                       50/1000 | eta: 16s               warmup ====                                    100/1000 | eta: 10s               warmup ======                                  150/1000 | eta:  8s               warmup ========                                200/1000 | eta:  7s               warmup ==========                              250/1000 | eta:  6s               warmup ===========                             300/1000 | eta:  6s               warmup =============                           350/1000 | eta:  5s               warmup ===============                         400/1000 | eta:  4s               warmup =================                       450/1000 | eta:  4s               warmup ===================                     500/1000 | eta:  4s               warmup =====================                   550/1000 | eta:  3s               warmup =======================                 600/1000 | eta:  3s               warmup =========================               650/1000 | eta:  3s               warmup ===========================             700/1000 | eta:  2s               warmup ============================            750/1000 | eta:  2s               warmup ==============================          800/1000 | eta:  1s               warmup ================================        850/1000 | eta:  1s               warmup ==================================      900/1000 | eta:  1s          
warmup ====================================    950/1000 | eta:  0s          
warmup ====================================== 1000/1000 | eta:  0s

sampling 0/1000 | eta: ?s sampling == 50/1000 | eta: 3s sampling ==== 100/1000 | eta: 3s sampling ====== 150/1000 | eta: 3s sampling ======== 200/1000 | eta: 3s sampling ========== 250/1000 | eta: 3s sampling =========== 300/1000 | eta: 3s sampling ============= 350/1000 | eta: 3s sampling =============== 400/1000 | eta: 3s sampling ================= 450/1000 | eta: 2s sampling =================== 500/1000 | eta: 2s sampling ===================== 550/1000 | eta: 2s sampling ======================= 600/1000 | eta: 2s sampling ========================= 650/1000 | eta: 2s sampling =========================== 700/1000 | eta: 1s sampling ============================ 750/1000 | eta: 1s sampling ============================== 800/1000 | eta: 1s sampling ================================ 850/1000 | eta: 1s sampling ================================== 900/1000 | eta: 0s
sampling ==================================== 950/1000 | eta: 0s
sampling ====================================== 1000/1000 | eta: 0s

summary(draws)

Iterations = 1:1000
Thinning interval = 1
Number of chains = 4
Sample size per chain = 1000

Empirical mean and standard deviation for each variable,
plus standard error of the mean:
```
Mean      SD  Naive SE Time-series SE
```

int 4.2804 0.07831 0.0012382 0.0014269
coef 0.4145 0.01882 0.0002975 0.0003475
sd 0.4110 0.02471 0.0003908 0.0004468

Quantiles for each variable:
```
2.5%    25%    50%    75%  97.5%
```

int 4.1307 4.2267 4.2803 4.3348 4.4334
coef 0.3776 0.4021 0.4141 0.4271 0.4510
sd 0.3666 0.3938 0.4100 0.4272 0.4626

STEP4: The next step is to ensure you do not have to set the System Variable about which Python, “reticulate” should use. We do this by editing the file “Rprofile.site” in R itself. Do Not Use NOTEPAD (the app that comes with Windows unless you know how to use it correctly. It can be done, with a little care)
a.	Open the file in R. For me it is located in “C:\Users\sreedatta\Documents\R-4.1.0\etc”. Even though I have Admin rights on my machine, I prefer to install tools such as R in my User folders since data, files and folders are automatically backed-up on my laptop.
b.	Add the following two lines at the end of the “Rprofile.site” document and save the file.

Sys.setenv(RETICULATE_AUTOCONFIGURE = FALSE)
Sys.setenv(RETICULATE_PYTHON = ‘C:/ProgramData/Anaconda3/envs/gretf’)

STEP5: Now close out of R and Restart R. In the R script file you have saved titled “Testing_greta.R” comment out the Sys.setenv command by adding a # (hash-tag) before the command.
a.	Run the command “library(reticulate)”
b.	Then run the command “reticulate::py_config()
c.	You should get the same output as before and as shown below:

python: C:/ProgramData/Anaconda3/envs/gretf/python.exe
libpython: C:/ProgramData/Anaconda3/envs/gretf/python37.dll
pythonhome: C:/ProgramData/Anaconda3/envs/gretf
version: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:37:01) [MSC v.1916 64 bit (AMD64)]
Architecture: 64bit
numpy: C:/ProgramData/Anaconda3/envs/gretf/Lib/site-packages/numpy
numpy_version: 1.21.1

NOTE: Python version was forced by RETICULATE_PYTHON

d.	If you do not get the output then R did not read the “Rprofile.site” correctly. 
e.	I prefer to use the “Rprofile.site” settings so that I do not have to try and remember to run the Sys,setenv() command for “RETICULATE_PYTHON” every time I want to use “great”.
f.	If you follow the instructions provided precisely, these should work. However, please note that with Open-source projects and tools, you are using these tools with a known risk that when one tool works, another might break (this happens with “rstan” package, where setting the optimization level for C++ code can hinder the compiler when it comes other packages that do not use the more complex optimization that “rstan” uses)
g.	If you post the steps you have used, with a properly documented output and error messages, I can try to help as time permits.

All the best in installing and using “greta”. - Sree

njtierney · September 20, 2021, 7:54am

Hi there @sreedatta - Thank you so much for posting these instructions!

We will be putting a new release of greta out soon, it should hopefully resolve a few installation issues, I’ll post here when we have a soft release ready.

Cheers!

sreedatta · September 27, 2021, 2:24am

@njtierney thank you for the encouragement, but I could only do this first on my own laptop with the help of @ajf. That made all the difference. Thanks to the team who have made greta available for R users. It is a wonderful tool and enriches the Bayesian environment available for R and Python users. Once the new version is ready, I will post the new instructions again.

Would like to help in any way I can by contributing to documentation.

Sree