Hi,
I am a Greta novice, looking for advice on how to use the package to run a non-linear regression model with spatial random effects. I have had some success running my desired model in STAN using ‘brm()’, but am hoping Greta offers a more time effective alternative. In STAN the model took the form:
brm(y_var ~ x_var + I(x_var^2) + sar(Distmat), family = 'gaussian', data = data)
With sar() offering a spatial autoregressive variance structure based on the distance matrix Distmat. For implementing this model using Greta, I am however stumped with how to translate this random effect component, and how to then use this error component in defining the distribution of my dependent variable?
I have included a section of code below, to help clarify my query. Briefly, the ‘data’ is a small sample of a large dataset comparing global patterns in two variables (for brevity referred to here as x and y). Each xy data point is associated with a GPS coordinate (Lat & Lon) from which it is possible to generate a distance matrix, which I would like to be able to build into my model under the assumption that data points clustered closer together are likely to possess similar values for both x and y. From my limited experience with Greta, I believe that I should use this distance matrix to parameterise the variance component of the model (sigma in the code below)? However, this approach results in a matrix array which is then incompatible with the scalar and vector structures of my remaining model parameters, and dependent variable distribution. Is this an issue that can be solved by implementing my spatial random effects using the greta.gp package?
Any advice would be most appreciated. Thanks in advance!
My Code:
#load required packages
library(greta)
library(geodist)
library(sp)
# Load dataset (represents a small sample of much larger dataset)
data <- data.frame(Lon = c(3121951.802,-7201570.19918,-13843810.1918,-698458.161,
875707.809,5932561.80922,-5887708.1907,-6174940.190,
12653341.882,4177753.80082,-3231934.1918,5413075.802,
16971919.833,-6123328.19918,-8164246.1918,4473961.882,
-3521410.118,6771817.80082,-7815304.1918,-3687466.11),
Lat = c(5387011.84712,-4523614.1588,-1278790.1201,5084071.812,
-3382540.15201,-2822662.1201,3451561.8799,-8374318.588,
2605573.84799,-1178932.1501,4534291.841,3715231.847,
-1613146.15201,3175549.8499,562411.8473,-7974886.158,
-4939876.1558,2980321.8479,6331735.8472,-3889684.151),
x_var = c(2.319, 2.091, 2.221, 2.095,
2.150, 2.133, 2.189, 2.291,
2.308, 2.076, 2.067, 2.157,
2.103, 2.181, 2.151, 2.377,
2.148, 2.377, 2.278, 2.349),
y_var = c(-5.336, -2.931, -2.417, -1.705,
-4.435, -2.561, -3.298, -4.443,
-3.611, -4.371, -3.396, -3.053,
-2.567, -2.210, -2.756, -4.717,
-3.141, -4.837, -3.045, -3.068))
# Generate spatial distance matrix
coords <- suppressWarnings(sp::spTransform(SpatialPoints(data[,c('Lon','Lat')], sp::CRS('+proj=moll')), sp::CRS('+proj=longlat'))) # this line of code involves the translation from Mollweide to WGS184 projection.
DistMat <- 1/geodist(coords@coords, measure = 'geodesic') # invert distance matrix - values closer to 1 denote closer proximity.
DistMat[sapply(DistMat, is.infinite)] <- 1 # reassign matrix diagonal.
# set up greta variables
x <- as_data(data$x_var)
y <- as_data(data$y_var)
# Model parameters (no priors)
int <- variable()
beta1 <- variable()
beta2 <- variable() # polynomial slope function
# Spatial random effects.
sigma <- ? # I am pressuming that this would require some sort of Gaussian decay distribution utilising the distance matrix above to outline the neighbouring context of the xy variables e.g. exp((-DistMat^2)/(2*(Lengthscale^2)))
distribution(y) <- ?
# setup non-linear model (a relatively simple polynomial model)
mod <- int + (x * beta1) + ((x^2) * beta2)
# implement greta model
model_greta <- model(int, beta1, beta2, sigma)
draws_greta <- mcmc(model_greta, warmup = 1000, n_samples = 1000)