Hmm… interesting … i might really wanna look into a scala version…
otoh… i wonder if greta uses some sort of symbolic differentiation…
also, thinking about discrete variables, and hmc
discrete variables can have a deterministic equation of motion associated with them such that the total energy stays unchanged, so, that would be the not sigmoid way of doing things
i think this discrete variable problem a very strange problem
neural networks have sigmoids to be able to learn
you say that states which are “in between” the discrete values are forbidden, you assign zere probability to them, you throw them away, as long as detailed balance holds between the discret states, the markov chain will generate the right distributione , and since the equations of motions are symmetric, you can do this, but there might be some useful information if the there is a linear or even non line-transformation beftween the two discreet states , its a different model for sure , but still if the total energy of the system is not changing then that might mean that you are extending your model “in the right way” - you include the only single extra assumption into your model that is allowed to be incluled
i wonder why this technique is not being already widely used in hmc?
is there a fundamental reason why?
i am also wondering about tf
it spends most of its time by calculating differential, derivatives which might or might not be so easily parallelizable…
but for example, if i were to use monix, a distributed async reactive framework for scala, then i could distribut the calculations to 10000 nodes or, according to the differentiation rules, and then combine them in some optimal fashion
since the graph which describes the computation is first class and dynamic and type safe, i am really wondering why not scala is the choice for such calculations?
when thats pretty much where the only usable reactive, async, massively parallel solution can be implemented …
i am even wondering if it were possible to outdo tf by a few lines of scala code, when it comes to scaling to massive datasets…
symbolic differentiation is pretty easy to handle with such reactive streams
caching is take care out of the box…
i mean… yes, for a few gpu-s, etc… a custom built c++ code might be enough, but what if you want to use 1 million cores?
how do you write code for that?
i could guess, that in scala, its not more than a few k lines, but in c++ where u dont have reactive systems, first class support for them… things are not composable… i would not stand a chance…
i dont know anything about this topic, but i have seen recently massive development in this area and i have a gut feeling that even i might be able to write a code that beats tf on one million cores…