Frequentist implementation


#1

Hello all,

I’m considering to use greta in the future to fit Latent Variable Models. However, my audience might have some issues understanding a Bayesian approach so I’m looking for fitting methods that can fit LVMs in the frequentist framework. LVMs are generally difficult to fit since the latent variables need to be integrated out (although MCMC tends to simplify this :slight_smile: ), but alternatives to fit LVMs including variational approximation, laplace or adaptive GH quadrature, is probably less understandable to my audience than the Bayesian framework.

On the greta vignette it’s stated that by initiating parameters using variable() instead of a prior, one will retrieve a frequentist solution. However, it’s unclear to me what happens “under the hood”. How does this relate to maximizing the (correct) likelihood? Does greta still sample from a (the?) posterior? In the latter case, I wonder about the philosophical implications of this for a frequentist solution.

Thanks!

Bert


#2

Right, in those (and other hierarchical) models, you need to somehow marginalise some random variables. You can use MCMC to do that marginalisation, or those methods you suggest, but you don’t have to be Bayesian about it. MCMC isn’t inherently Bayesian, but it is commonly used for Bayesian modelling.

If you create a variable with variable() it doesn’t specify a prior for that parameter. So you can create all of your model parameters with that and then fit the model by maximising the likelihood (if there are no priors, the density that is defined is just the likelihood). If you do MCMC, and then find the mode of the samples for the parameters you are trying to optimise, that should be the marginal maximum likelihood solution, the same thing you’re trying to get from variational or Laplace approximation.

Having said that, there is a catch. greta transforms constrained parameters to make them easier to sample or optimise. greta defines an unconstrained (can take values between -Inf and Inf) variable which it then looks at when sampling/optimising, and then transforms that unconstrained variable to meet the constraint requested by the user (e.g. exponentiating to get a positive-constrained parameter). When there’s a transformation of parameters like that, it introduces a discrepancy between the frequentist and Bayesian approaches. In the Bayesian setting, you need to account for the change of support in computing the density, in the frequentist setting you don’t. These are the (log) Jacobian adjustments that you see mentioned in parts of this forum, the Stan forum (Stan does the same trick) and the greta and Stan docs.

When using greta::opt() there’s a flag adjust which you can toggle to turn off these adjustments. If you flip that switch and use variable() to define your model parameters, you’ll get a maximum likelihood estimate. there’s no such option for greta::mcmc(), because doing MCMC for frequentist models is pretty uncommon. However, if none of your parameters are constrained (just variable(), not variable(lower = <x>, upper = <y>) etc.), it won’t make a difference.


Ultimately I’m hoping greta will have a series of other options for marginalising variables within a model, e.g. by doing something like Laplace approximation within a maximu likelihood optimisation or Bayesian MCMC. I made some progress on an interface and Laplace approximation in this branch, but I’m not happy that it works yet. Hopefully I’ll be able to pick that up and implement it in greta soon though.


#3

Great answer Nick, thank you. It confirmed some of the suspicions I had. I’ll think about this and get back to you if I have any other questions!

Out of curiousity, why do you think it is that MCMC is not more commonly applied in frequentist stats? Working out Laplace/ variational on a per case basis can be a lot of work. Seems to me that MCMC could simplify that.


#4

An imminent (frequentist) statistician once told me that there is Bayesian philosophy and Bayesian technology and sometimes, even if you are a frequentist philosophically, you should Bayesian technology.

I can’t speak to why Laplace/variational techniques are still used but I do know that MCMC is great technology!