Skip to main content

Variance Components Beyond Genetics

Variance components analyses focus on estimating the net contribution of an entire group of variables to an outcome, without requiring estimating each variable; this is critical for learning if the haystack of variable contains a needle at all, and yet, this approach is hardly used outside behavioral genetics. That should change.

Where else besides genetics can we use behavioral genetics’s workhorse of variance components analysis to nail down the net contribution of entire classes of effects rather than the usual (and usually futile) approach of attempting to exactly estimate one or a handful of said effects? If power analysis tells you whether you have enough light to find the needles in the haystack, variance components can tell you whether there are even any needles to look for.

This requires some form of ‘distance’ equivalent to genetic relatedness for doing the clustering, which typically doesn’t exist—but how much of that is simply that practitioners in all other areas simply don’t think about this at all? And where there is no natural distance, it may be possible to synthesize a proxy one out of a lot of raw data and, using that as a ‘bar code’ or ‘fingerprint’, cluster individuals that way (cf. hash trick, k-NN/nearest-neighbor interpolation, compressed sensing). We have already seen imaginative applications of it in high-dimensional data like brain imaging or leaf spectral imaging, so perhaps there is far more that can be done:

null{style=“display:none”;}


  1. One could use Herculano-Houzel’s trick to easily turn ‘diet’ into a single homogenous sample: blenderize it! One could also try to reuse the Rincent trick of infrared photography. If those don’t work, feces may be acceptable individual-level samples, and if that doesn’t work, perhaps sewage samples?↩︎