Panties and Bayes Theorem
Nate Silver, author of “The Signal and the Noise”, is a Bayesian. This requires some explanation.
Silver is a great writer (despite persistently referring to “data” in the singular). One example: he explains Bayes Theorem (and Bayesian thinking) with a startling example: one morning you open your dresser drawer and find some strange underwear there. The immediate train of thought then goes towards: is my partner cheating on me? Maybe he just likes wearing panties! How to figure this out accurately and not emotionally goes straight to Bayes Theorem.
Thomas Bayes was born around 1701. He was a modest clergyman and is often overshadowed by R.A. Fisher, whose statistical thinking in the early 20th Century overtook and now dominates modern statistics. The Fisherian view in simplest terms is that the data drive the interpretation. The flaw here, according to Silver, is that this means all data - any data - including noise are being twiddled with, and this is used then to drive interpretation and arrive at conclusions. If you wonder if this really works, just read all the articles in the next month that you can find on drug development and testing. It’s truly frightening, and goes a long way to explain some of my family’s aversion to accepting new drugs to deal with a minor problem. No heuristics (modeling) is involved with this sort of analysis: one is just seeking correlations, the assumption being that from these alone we can discern the true state of something. A metaphor here: one can do the same thing by looking for animals or faces in clouds. Silver shows why statistical significance tests (the “T-Test”, etc.) implicitly acknowledge that noise is a problem, but these tests are essentially worthless in isolation, says Silver, and should never be taught to impressionable students. Interestingly, this is still the core of the teaching of most biology departments in the United States here in the 21st Century - however this is changing. Fisher himself had increasing qualms about his approach towards the end of his life (he died in 1962). Silver considers Fisherian statistical significance teaching to be abuse verging on the criminal.
Instead, Bayes Theorem is a simple alternative (with a simple equation) that requires a previous assessment called a prior probability. The equation is simple algebra:
xy + z (1-x)
This gives the probability (in this case for cheating, but it could be for anything whose likelihood you are trying to assess). In this equation x is the prior probability: your initial estimate of how likely it is that your partner is cheating on you before you found the strange underwear. The variable y is the probability that the underwear appearing in your dresser drawer is conditional on him cheating on you. The variable z is the probability that the underwear appearing in the drawer does NOT mean that he is cheating on you (in other words, there is another explanation - perhaps he likes wearing panties). This equation can be continually updated as you gather more information. In other words, one way to NOT make something huge out of a lot of noise (or a single questionable data-point) is to fit previous understanding into the assessment. This implicitly involves incorporating an understanding of the system - how things work in the real world - into the assessment process. Fisher’s contribution was well-intended (to make analysis as objective as possible), but the effect was to divorce system understanding from everything but the raw data. Implicitly, this means accepting the idea that the data tells us everything there is to know. This might work if there was no noise in the system, but then the answer would be obvious, right? This kind of thinking lies behind the vast NSA data-gathering exercise exposed by Edward Snowden in June 2013. It has had its successes, but these come at a significant cost, both financially (a brand-new, $5 billion data center in the Utah desert), and in terms of privacy.
Another crucial advantage of Bayesian thinking is that as you gather additional data, you update the simple Bayes Theorem equation, and your estimate or assessment will then be better. Do this iteratively, and you can theoretically approach a perfect or close to perfect assessment or forecast.
Does this actually work? Silver walked the walk in the most amazingly effective way: he predicted the outcome of two presidential elections to within a fraction of a percent (Gallup and Rasmussen polls were off by as much as 8 percent from what actually happened – the hard proof of what the “truth” was). One success of that order might have been just extraordinary good luck, but Silver did this many times, including senatorial and governor’s races - and he got virtually all of them correct.
Bayes Theorem works. Statistical significance tests on data alone are so 20th Century.