Right, in those (and other hierarchical) models, you need to somehow marginalise some random variables. You can use MCMC to do that marginalisation, or those methods you suggest, but you don’t have to be Bayesian about it. MCMC isn’t inherently Bayesian, but it is commonly used for Bayesian modelling.
If you create a variable with variable()
it doesn’t specify a prior for that parameter. So you can create all of your model parameters with that and then fit the model by maximising the likelihood (if there are no priors, the density that is defined is just the likelihood). If you do MCMC, and then find the mode of the samples for the parameters you are trying to optimise, that should be the marginal maximum likelihood solution, the same thing you’re trying to get from variational or Laplace approximation.
Having said that, there is a catch. greta transforms constrained parameters to make them easier to sample or optimise. greta defines an unconstrained (can take values between -Inf and Inf) variable which it then looks at when sampling/optimising, and then transforms that unconstrained variable to meet the constraint requested by the user (e.g. exponentiating to get a positive-constrained parameter). When there’s a transformation of parameters like that, it introduces a discrepancy between the frequentist and Bayesian approaches. In the Bayesian setting, you need to account for the change of support in computing the density, in the frequentist setting you don’t. These are the (log) Jacobian adjustments that you see mentioned in parts of this forum, the Stan forum (Stan does the same trick) and the greta and Stan docs.
When using greta::opt()
there’s a flag adjust
which you can toggle to turn off these adjustments. If you flip that switch and use variable()
to define your model parameters, you’ll get a maximum likelihood estimate. there’s no such option for greta::mcmc()
, because doing MCMC for frequentist models is pretty uncommon. However, if none of your parameters are constrained (just variable()
, not variable(lower = <x>, upper = <y>)
etc.), it won’t make a difference.
Ultimately I’m hoping greta will have a series of other options for marginalising variables within a model, e.g. by doing something like Laplace approximation within a maximu likelihood optimisation or Bayesian MCMC. I made some progress on an interface and Laplace approximation in this branch, but I’m not happy that it works yet. Hopefully I’ll be able to pick that up and implement it in greta soon though.