Hmmâ€¦ interesting â€¦ i might really wanna look into a scala versionâ€¦

otohâ€¦ i wonder if greta uses some sort of symbolic differentiationâ€¦

also, thinking about discrete variables, and hmc

discrete variables can have a deterministic equation of motion associated with them such that the total energy stays unchanged, so, that would be the not sigmoid way of doing things

i think this discrete variable problem a very strange problem

neural networks have sigmoids to be able to learn

you say that states which are â€śin betweenâ€ť the discrete values are forbidden, you assign zere probability to them, you throw them away, as long as detailed balance holds between the discret states, the markov chain will generate the right distributione , and since the equations of motions are symmetric, you can do this, but there might be some useful information if the there is a linear or even non line-transformation beftween the two discreet states , its a different model for sure , but still if the total energy of the system is not changing then that might mean that you are extending your model â€śin the right wayâ€ť - you include the only single extra assumption into your model that is allowed to be incluled

i wonder why this technique is not being already widely used in hmc?

is there a fundamental reason why?

i am also wondering about tf

it spends most of its time by calculating differential, derivatives which might or might not be so easily parallelizableâ€¦

but for example, if i were to use monix, a distributed async reactive framework for scala, then i could distribut the calculations to 10000 nodes or, according to the differentiation rules, and then combine them in some optimal fashion

since the graph which describes the computation is first class and dynamic and type safe, i am really wondering why not scala is the choice for such calculations?

when thats pretty much where the only usable reactive, async, massively parallel solution can be implemented â€¦

i am even wondering if it were possible to outdo tf by a few lines of scala code, when it comes to scaling to massive datasetsâ€¦

symbolic differentiation is pretty easy to handle with such reactive streams

caching is take care out of the boxâ€¦

i meanâ€¦ yes, for a few gpu-s, etcâ€¦ a custom built c++ code might be enough, but what if you want to use 1 million cores?

how do you write code for that?

i could guess, that in scala, its not more than a few k lines, but in c++ where u dont have reactive systems, first class support for themâ€¦ things are not composableâ€¦ i would not stand a chanceâ€¦

i dont know anything about this topic, but i have seen recently massive development in this area and i have a gut feeling that even i might be able to write a code that beats tf on one million coresâ€¦