Do You Really Have That Disease?
You're sitting in the doctor's office waiting for the result of a test. The test will tell you if you have a disease you really don't want to have.
As you wait, it seems as if the whole world is poised like a pencil balancing on its tip. In a moment, the doctor will come through that door and, based on the test, your world will fall one way or another.
Or will it?
I hope you never face a moment like this or, if you have faced one, it's now far behind you. But as scary as these moments can be, they also illustrate a more subtle way of thinking about a field most people (including me) aren't very good at thinking about: statistics.
As part of my sabbatical season, I've been teaching myself probability and statistics (I'm training for work I want to do in Network Theory). Statistics has never been a field that's come easily to me. But the deeper I dig into it, the more I'm falling in love with its ideas. Today, I want use the example of the doctor and the test to introduce something remarkable in statistics that's been affecting us all. It's called Bayes' Theorem. And because of its use in Big Data and all those algorithms ruling our world, it's worth taking a moment to consider.
So let's get back to the doc's office. Let's say she comes in and tells you the test was positive. She also tells you the test is 80 percent accurate.
Does that mean there's 80 percent probability that you have the disease?
While that might seem to be the intuitive conclusion, it doesn't tell the whole story. But to tell the whole story, you'll need a subtler kind of statistical reasoning. That's where Bayes' Theorem comes in.
The usual way of thinking about statistics is what's called the frequentist interpretation. Simple frequentist reasoning goes like this: Let's say there are 1,000 people living in small town and 10 of them have an illness. If I pick a citizen at random, what is the probability they have the illness? A simple frequentist calculation says the probability would be 1 percent (i.e. 10/1000 = 0.01).
The Bayesian interpretation of statistics looks at things much differently. It doesn't just see probabilities as an expression of how frequently something happens. Instead Bayesian's take statistical inference as a way of stating our confidence about unknown states of the world based on knowledge of the world we already have. With each new piece of knowledge we can update how confident we are about something being true or not true. In that way, the Bayesian view often focuses on what's called conditional probability. What is the probability of this happening if I already know that has happened? (It's actually much richer than this but I'm simplifying here for brevity.)
To see how it all works, let's go back to our test. We already know the test was 80 percent accurate. But now we also want to consider the times when the test comes back positive even though you don't have the disease. These are known as false positives. Let's assume there is a 9.6 percent probability of the test coming back with a false positive. Finally, we should also ask: What's the raw probability of getting the disease? Is it rare or common? Let's say it's pretty rare, like in our small town example, with the probability of a random person having the disease being just 1 percent.
The power of the Bayes perspective is the way it quickly works with all this information rather than just looking at the positive test result. It considers how rare the disease is and the possibility of false positives, too. In this way the famous Bayes' Theorem (whose equation and explanations can be found here) allows you to quickly get the all important conditional probability you're really interested in. And what do we mean conditional probability? We mean the probability that you actually have the disease given that the test came back positive. That's the real question you want to answer.
Remember, if you had only considered the test, you'd think the probability you have the disease was 80 percent. With Bayes' Theorem, however, it turns out that, in fact, you really only have about an 8 percent chance of having the disease. There is a big difference between 80 percent and 8 percent — and that difference comes because of the facts that disease is actually rare and there is a non-zero rate of false positives. By considering these other pieces of information in a clear and quick way, Bayes' Theorem upends our usual intuitive statistical sensibilities.
What our little parable of the doctor's visit and test really tells us is the enormous power of the Bayesian perspective. Even though its been known for sometime (Thomas Bayes lived in the 18th century), its study and use has been growing like wildfire over the last few decades. In particular, the world of Big Data is powered by statistical reasoning — and much of that reasoning is powered by Bayes. So whether it's a spam filter or the language recognition going into your conversations with Siri, you've already met Bayes a lot of times.
(*Please note the example in this post DOES NOT tell you to ignore any test you get but to get it done again or have other tests to confirm the initial result. Please! Talk with your doctor).
Adam Frank is a co-founder of the 13.7 blog, an astrophysics professor at the University of Rochester, a book author and a self-described "evangelist of science." You can keep up with more of what Adam is thinking on Facebook and Twitter: @adamfrank4