In my previous post in this series I discussed some of the pitfalls that can lure you in to drawing false conclusions from experimental data, including confounding factors, mistakes, bad luck, and pure random chance. All those factors arise in the context of controlled experiments. In a controlled experiment there are two groups of test subjects which are made as much alike as they can be. (There is an entire industry devoted to breeding genetically identical rats for use in laboratory experiments.) These two groups are subjected to conditions that are as much alike as they can be made, save for one factor, which is the object of interest (usually a drug or some other chemical). All this effort is made in an attempt to eliminate confounding factors. It doesn't always work, and even when it does pure random chance can produce false results in a surprisingly large number of cases.
But there are many cases where a controlled experiment is not possible. We can do pretty well with animal models, but when it comes time to run experiments on humans we don't usually have access to large numbers of genetically identical test subjects. Instead, a technique called randomization is used, so that the two groups, while not identical, are unlikely to be biased in any particular direction by any confounding factor. Part of the process of designing a study is (or at least ought to be) doing the math to figure out how many test subjects you need so that randomization gives you the desired low probability of confounds.
But sometimes it is not possible to do a controlled study because controls are just too difficult to enforce. Suppose you want to know, say, if eating carrots reduces the risk of cancer. Cancer is a very slow disease, taking years to manifest itself. It would be all but impossible to rigorously enforce a protocol where one group of test subjects consumed a known quantity of carrots while a control group ate none over a period of years.
In situations like this scientists fall back on what is known as epidemiological studies. These are named after the science of epidemiology, which is, naturally, the study of epidemics (where, for obvious reasons, it is often impossible to do controlled studies). But the methodology of epidemiology can be -- and is -- applied far more broadly.
The basic idea behind epidemiology is that when you can't go through the usual process of assembling treatment and control groups, sometimes you can go back and look at people's history and assign them to the proper category retroactively. For example, we might take 1000 people and ask them if they eat carrots regularly, and then see if the ones that say they do get less cancer than the ones that say they don't.
The problem with this approach is that you don't get to randomize the two groups, and so the possibility of confounding factors is much higher. Supposed we find 500 carrot-eaters and 500 non-carrot eaters and discover that the non-carrot-eaters had 50 cancers among them while the carrot-eaters had only 40. Would we be justified in concluding that eating carrots reduces the risk of cancer by 20%?
No, we would not. For one thing, these results might not be statistically significant even in a controlled study! (Whether they would be or not depends on a lot of factors, a discussion of which would take us far afield.) But let us put that aside and assume that this is a statistically significant result. How can we be sure that carrots are the cause of the reduction in the incidence of cancer? It is possible that eating carrots coincides with other healthy lifestyle choices -- like exercising regularly for example -- and that it is exercise, not carrots, that produces the beneficial effect. In a controlled study this wouldn't be a problem because the subjects would be randomized, and presumably you'd have the same number of exercisers and non-exercisers in each group. But in an epidemiological study we do not have that luxury.
There are statistical techniques to get around this problem. Basically, the idea is to divide up a large group of people in various ways so that you essentially make "virtually randomized" treatment and control groups out of them retroactively. I don't have time to go into details, but the point is that it can be done. So supposed we ask all our test subjects: do they exercise? Do they smoke? Do they live at high altitudes? Near nuclear power plants? Near power lines? How often do they talk on their cell phones? We collect all this data and we do the statistical slice-and-dice and lo-and-behold a signal arises from all the noise that indicates with 95% confidence that indeed easting carrots does reduce the risk of cancer by 20%. Are we now justified in concluding that this is actually true?
It should come as no surprise by now that the answer is still no. The reason is that there is always the possibility that there is a confound that we might have not considered and hence forgotten to put into our questionnaire. How likely is that? The only way to be sure that it's really the carrots and not something else is to follow up with a controlled study. Let's stipulate that this is too difficult. Are we screwed?
Not completely. There are two other things we can do. One is to look at the questionnaire and see how thorough it is. If we submit the study to peer review and no one can think of anything that we should have asked about and didn't then that's a pretty good indication (though far from foolproof) that we're on the right track with the carrots. But there is another thing we can do, and this really gets at the heart of science: we can ask why eating carrots should reduce the risk of cancer.
Science is not just about doing experiments and crunching numbers. Science is really about explaining things. Experiments are not a tool for directly getting at the truth, they are a tool for helping decide between alternative explanations.
So one possible explanation (in science we call these hypotheses) of why carrots might reduce the risk of cancer is that they contain chemical substances which neutralize the effects of various carcinogens that everyone is exposed to in the course of day-to-day life. The reason that this is progress is that this explanation can be tested in ways other than feeding people carrots. For example, we can try to identify these chemicals and see if they occur in other foods, and see if eating those foods also reduces the risk of cancer. We can also try to extract or synthesize those chemicals and see if consuming them as dietary supplements makes a difference. (Turns out that often they don't.)
Have you ever wondered why you seem to hear different advice about what to eat to reduce your risk of cancer every time you turn around? It's because most of the results of epidemiological studies are wrong!
Which brings me to flamingos. As everyone knows, flamingos are pink. Famously so. The statistical correlation between being a flamingo and being pink is really off the charts. And yet a flamingo's pinkness is not genetic, except in a very roundabout sense. Flamingos are pink because their natural diet consists mainly of shrimp, which are high in beta-carotene, which has a reddish color. It's the same chemical that makes carrots orange. (Ironically, beta-carotene supplements appear to increase the risk of cancer!) The beta carotene turns their feathers pink. Feed a flamingo something other than shrimp and their feathers revert to white, their "natural" color. (The same mechanism is what makes wild salmon pink. Farm-raised salmon are white, which is why they have artificial color added to make their flesh look more "natural". Gotta love the irony.)
There's more to say on this but I have to stop now. I guess there will be a third installment of this series. If you want a sneak preview, go buy a copy of David Deutsch's book "The Fabric of Reality" and read chapters 3, 4 and 7.
Here endeth the lesson. :-)