Friday, November 30, 2007

Today's immigrant bashers are the children of illegal immigrants

Kelly, a native American reader of Glen Sacks's blog tells it like it is:

Let me tell you what I think of you pathetic immigrant bashers. You and your families have no right to be here. You are the descendents of liars, thieves, and genocidal murderers. Your ancestors have no honor. We gave you help, food and shelter when you needed it, and guided Lewis and Clark across the continent. In return, you broke every promise you ever made, shot us in back whenever you could, cut down the forests, killed the wildlife, and stole everything that was not nailed down.

Laws, treaties, boundaries, borders, and promises meant nothing to you if you thought that, as a white man, you deserved to have it. Gold on our sovereign land, here comes the white man! Shortcut to the West, we don't need to pay any damn Indians no damn toll fees. There is very little moral difference between your ancestors' actions and some gang member who is helping himself to your grandma's wallet. A squatter is a squatter. So I tell you squatters to get off your high horse.


I think it is worth noting that Kelly admonishes the immigrant bashers to get off their high horse, but not to go back where they came from.

Barbarians

There is no other word for this:

"Thousands of protesters, many brandishing clubs and swords, took to the streets of Sudan’s capital Friday, demanding the execution of a British teacher who let her students name a teddy bear Muhammad."

You need to bury your head pretty deep in the sand to say today that Islam is a religion of peace.

It may be a good thing that no one reads my blog, or this post might actually be putting my life at risk.

Too little too late

Senator Joe Biden says:

"The President has no authority to unilaterally attack Iran and if he does, as foreign relations committee chairman, I will move to impeach."

Terrific. You're going to wait until after we've gotten sucked into yet another quagmire in the middle east to impeach this bastard? This isn't closing the barn door after the horses have left, this is waiting until they've ridden over the horizon to start walking towards the barn.

Sheesh.

The abstinence-only folks are going to love this

ABC News reports:

"While past research has linked early sexual activity to health problems, a new study suggests that waiting too long to start having sex carries risks of its own. Those who lose their virginity at a later age -- around 21 to 23 years of age -- tend to be more likely to experience sexual dysfunction problems "

Wednesday, November 28, 2007

Can you say "entrapment"?

What happens when the New York police can't catch enough real criminals? They make their own.

A small victory for civil rights

With good news about civil rights becoming increasingly rare nowadays I am pleased to report a small victory:

U.S. prosecutors have withdrawn a subpoena seeking the identities of thousands of people who bought used books through online retailer Amazon.com Inc., newly unsealed court records show.

The withdrawal came after a judge ruled the customers have a right to keep their reading habits from the government.


In one of the most poetic metaphors I have ever read in a legal ruling, Judge Stephen Crocker wrote:

"The (subpoena's) chilling effect on expressive e-commerce would frost keyboards across America."

Tuesday, November 27, 2007

Here's one we can all agree on

Women with hourglass figures tend to be more intelligent and have smarter kids, a new study says. No, really.

Sunday, November 25, 2007

Neigh!!!

My goodness, I never imagined that posting a pointer to an article would cause such a kerfuffle! Dennis Bider doth protest too much, methinks but his protestations are a veritable smorgasbord of fallacious argumentation, so it's worth deconstructing. Let's see, we have...

Psychogenetic fallacy:

Flynn is one of those people who helped identify something new and fundamental, and then go on living their lives denying the best explanation because they would like it to be different.

Ad hominem:

I've long since accepted that you won't be converted to the rational outlook. You are a believer; your spiritual well-being rests on whether a certain hypothesis is right.

Straw man:

Stop trying to persuade everyone how unbiased you are

(I have never tried to persuade anyone that I am unbiased. To the contrary, I have been very up-front about my biases.)

I'm not quite sure how to categorize this one:

People form a hypothesis based on what they would like reality to be; not based on what the naked facts tell them; and so they spend decades trying to find out that group selection for lower populations would favor individual restraint in breeding, only to eventually find out that group selection for lower populations leads to cannibalism victimizing especially young females.

How on earth did we get from the genetics of IQ to cannibalism? Argument by gibberish? Non-sequitur? Maybe I'll just call that one the razzle dazzle.

Here's another one that's challenging to categorize:

The Flynn effect is best explained by heterosis

Even if it were true, just because heterosis is the best currently available explanation doesn't mean it is correct. (Ironically, even Bider himself concedes in the post he links to that heterosis is a hypothesis that has not been adequately tested.) The history of science is rife with examples of "best explanations" that turned out to be wrong.

Bider's second comment is the most colossal straw man I have ever encountered (and that's saying something). I think it says something about Bider's insecurity in his position that he feels the need to not only put words in my mouth but thoughts in my brain and feelings in my heart. This one is particularly ironic:

You are a person who has been cursed with an intolerance for other people's suffering.

because people who actually know me consistently tell me that one of my problems is that I don't empathize enough with those around me.

Finally,

Did desegregation of schools significantly narrow the black-white educational gap? [No, therefore] so much for environment.

This is another straw man, and a truly offensive one. I have never argued (and would never argue) that exposure to people with white skin is the aspect of the environment most responsible for anyone's academic success or lack thereof.

Don Geddis writes:

First of all, the article you link to doesn't support your summary.

My summary was that Flynn believes that "the evidence supporting the hypothesis that intelligence is primarily genetic is weak." I suppose I should have been more careful to distinguish, as Flynn does, between direct and indirect genetic effects. For example, there is no question that skin color is genetic, so if skin color interacts with some environmental factor (like societal bias) to produce some effect one could say that's a genetic influence. Technically it would be true, but I think that would be a perversion of what people generally mean when they say a trait is genetic. It's certainly not what I mean when I say that the evidence that IQ is genetic is weak, and I'm pretty sure Flynn would agree.

Once you get to average, US, major metropolitan levels of education, nutrition, etc., then the IQ variation due to genetics approaches 100%. It is not the case that the smarter kids were read to more, or went to the museum more often, or watched TV less. It is the case that the smarter kids typically had smarter parents.

If this were really true then that would convince me. But I doubt it's really true. In particular, as I have pointed out before, it is impossible to do a properly controlled study to test the effect of race on IQ because it is impossible to control for the societal bias to skin color. The only way to test the hypothesis is to use two racial groups whose IQ's supposedly differ but whose outward appearance is the same, like jewish and non-Jewish caucasians. (You'd need to have both groups raised in similar cultural circumstances, i.e. either all as Jews or all as non-Jews -- and preferably randomized across both cases.) Until you've done a study like that all you have is more pink flamingos.

Here's another Flynn example that is worth pondering: If on the basis of their genetic inheritance, separated-twin pairs are tall, quick, and athletically inclined, both members are likely to be interested in basketball, practice assiduously, play better, and eventually attract the attention of basketball coaches capable of transforming them into world-class competitors. Other twin pairs, in contrast, endowed with shared genes that predispose them to be shorter and stodgier than average will display little aptitude or enthusiasm for playing basketball and will end up as spectators rather than as players.

The trouble with basketball as an analogy to IQ is that height is an easily observable physical trait that obviously has a causal relationship with potential success as a basketball player. Furthermore, height has both genetic and environmental factors. All this is known and uncontroversial. The trouble is, whether or not externally observable and heritable traits are similarly correlated with intelligence is precisely what is at issue here. So there are no lessons to be drawn from the basketball analogy for the matter at hand (except that it is easy to oversimplify).

Just for the record let me make it clear where I stand. I do not dispute that IQ could be genetic. In fact, I think it almost certainly has some genetic component. The question is how much is genetic and how much is environmental, and, importantly, how much is due to complex interplays between genes and environmental and societal factors, and as long as we're being exhaustive, how much is due to the imprecision and multi-dimensionality of intelligence, and how much is just plain random. My position is merely that the currently available data do not justify the conclusion that direct genetic factors (i.e. the direct transcription of DNA into proteins that build bigger or more effective brains) are the dominant factor. Note that I do not say that this is not the case, merely that the data we have don't justify the conclusion. I will also say, as I have said before, that I do think it would be unfortunate if we did have conclusive data to support this position, and that the human condition would on the whole be the worse for it. And people's eagerness to adopt the conclusion that intelligence is genetic, and to vilify anyone who doesn't join them in their prejudice, even in the absence of conclusive evidence does nothing to dissuade me from this belief.

But don't take my word for it

You don't have to believe me that the evidence supporting the hypothesis that intelligence is primarily genetic is weak. James R. Flynn of Flynn effect fame thinks so too.

Friday, November 16, 2007

On the road again

BTW, just in case my loyal readers (both of you) have been wondering why I haven't been blogging or responding to comments lately, I'm on a business trip and won't be back until Saturday. Regularly scheduled ranting and raving should resume shortly thereafter.

Homeopathy: a case study in bad science

Ben Goldacre has an excellent article in the Guardian about how and why homeopathy is bad science.

Tuesday, November 13, 2007

Worse than useless

Referring to the Democrats of course.

UPDATE: I'm writing this entry from an airport where the TSA's latest security measures are also a contender for the worse-than-useless award. (Hm, there's an idea.) There was a kerfuffle a while back about the fact that there was an ID check in security but not when you actually board the plane made it trivial to fly using false credentials. To address this (presumably) you now get a nifty little blue stamp on your boarding pass when you go through the ID check. That way (I'm figuring the TSA figures) you can't swap out boarding passes after you clear security.

I guess the folks at the TSA have never heard of scanners and color printers.

Wednesday, November 07, 2007

Science 103: The virtue of simplicity

I've talked a lot so far in this series about how scientific experiments are designed and their results interpreted, about how statistics and controlled studies are used to filter out "real" results. But what does it actually mean to be a "real" result?

Here's a little puzzle to motivate the discussion: how many data points does it take to produce a statistically significant result, that is, a result that is very unlikely to have come about by chance? What is the smallest conceivable number of data points that would be needed under ideal circumstances?

Let's take a brief respite from biology and deal with physics for a moment. It is intuitively obvious that heavier objects should fall faster than lighter ones. Hold a rock and a feather in your hand and you can experience firsthand that gravity pulls harder on the rock than the feather, so it is entirely plausible that the rock should fall faster. And indeed it does (at least near the surface of the earth). This was the prevailing view among learned men (there were precious few learned women in those days) for thousands of years.

Which is interesting because even a moment's reflection will reveal that it is not intuitively obvious that heavier objects should fall faster than lighter ones. For starters, birds are heavier than feathers. Indeed, birds invariably carry a payload of tens of thousands of feathers (to say nothing of muscles and bones and other assorted support equipment) and yet if you drop a feather and a (live) bird the feather will generally fall faster. That should have been clue even to the ancients that there was something wrong with the theory that heavier objects fall faster than lighter ones. And yet, as far as I know, I am the first person ever to point this out. (One might argue that birds fall more slowly because they do work to stay aloft, but this is not the case either. Hawks can stay aloft for hours without flapping their wings.)

It gets worse. Imagine three identical rocks, two of which are coated with glue. Drop all three. Because they are identical they should fall at the same speed. Now imagine that in mid-flight the two glue-coated rocks come together and stick, making essentially a single rock that is twice as heavy. The heavier-objects-fall-faster theory would predict that this composite rock should now accelerate relative to the unglued control rock. But why should that happen if both of the component rocks were falling at the same speed to begin with? (And if that example doesn't convince you, imagine three identical skydivers. Two of them drift towards each other. Their fingers touch. They hold hands. They pull themselves towards each other and attach their harnesses together. Now they are a "composite" skydiver twice as heavy as before, and should therefore be falling faster than the lone control skydiver. At what point during this process would they start to accelerate?

As these examples illustrate, it often requires only one data point to produce a statistically significant result. Climb to the top of the leaning tower of Pisa, drop two canon balls, one twice as heavy as the other, and with a single data point you can convincingly disprove the theory that heavy objects fall faster than lighter ones.

Let's return to biology and our pink flamingos. How many non-pink flamingos would it take to disprove the theory that flamingos are genetically pink? Now it's not quite so clear. If I were to just exhibit a white flamingo one might argue that this particular bird simply has a mutation. Albino-ism is a well-known phenomenon in other species. But suppose that I showed you a white flamingo and told you that this flamingo had been raised in a zoo and fed something other than shrimp? Does that make the flamingos-are-genetically-pink theory untenable? Well, not entirely. One could still argue that this flamingo is a genetic albino, and it's just a coincidence that it was fed a non-standard diet. So then you could start feeding this flamingo shrimp and watch it turn pink. Does that make for convincing proof? Still no. A die-hard eugenicist could still argue that flamingos are genetically pink, but that the stress of being raised on food other than its natural diet somehow caused the genes for pinkness not to express themselves. Or something like that.

Of course, the heavier-objects-fall-faster theory is salvageable too if you're willing to tie yourself into enough rhetorical knots. You could argue that the heavier canon ball is also bigger and therefore experiences more drag, and that this extra drag just balances out the extra weight. Of course, this theory can also be disproved by dropping two canon balls of the same size but made of different materials. But then the die-hard Artistotelian could start spouting something about the particular materials used and how the proportion of earth to fire in their composition affects their falling rates and so on and so on. And if you think I'm belaboring the point beyond all reason, go read this or this or this or this.

There are two points to this story. First, there is no way in science to ever prove anything beyond all doubt. The best we can hope to do is to come up with parsimonious theories that are good fits to the observed data. (The fact that this is possible at all is actually quite remarkable, and is itself an observation that cries out for an explanation. Einstein once famously quipped that "the most incomprehensible thing about the Universe is that it is comprehensible." David Deutsch actually takes a pretty convincing shot at that question his book.)

Second, the number of data points that it takes to disprove a theory depends on the theory. The theory that heavier objects fall faster than lighter ones, period, end of story, can be disproved as I show above without actually conducting any experiments at all. The theory that heavier objects fall faster than lighter ones except under certain conditions is much harder to disprove, but much easier to dismiss out of hand simply because of how outlandish it seems to be a priori. Science rejects conspiracy theories not because they can be disproven (they can't -- that's why they are called conspiracy theories) but simply because they are not parsimonious. In science, simplicity is axiomatically a virtue.

In that light, Richard Lynn's theory has a lot to recommend it. It is quite parsimonious and plausible a priori. Harsh climates are indeed generally less forgiving of failures to plan ahead than milder ones. That genetics plays a significant role in determining intelligence is clear from the observation that humans are vastly more intelligent than other great apes, and the only possible explanation for that is our genes. And then there are Lynn's mountains of data, all of which seem to support the theory. It's pink flamingos as far as the eye can see.

Or is it?

In fact, there's a white flamingo in Lynn's data. Several of them actually. Some of them I've already pointed out in earlier posts so I won't belabor them here. I want to focus on one particular white flamingo: the average IQ for arctic peoples is lower than that for Europeans.

This is a serious problem for the theory that winter survival is what drives the evolution of intelligence, because if that were the case then one would expect arctic peoples to be the smartest on earth, and yet they are not by a wide margin (a full standard deviation). Lynn acknowledges this problem and dispenses with it by saying:

"The explanation for this must lie in the small numbers of the Arctic Peoples, whose population at the end of the twentieth century was only approximately 56,000 as compared with approximately 1.4 billion East Asians. While it is impossible to make precise estimates of population sizes during the main Wurm glaciation, there can be no doubt that the East Asians were many times more numerous than the Arctic Peoples. The effect of the difference in population size will have been that mutations for higher intelligence occurred and spread in the East Asians that never appeared in the Arctic Peoples.

You might want to see if you can figure out what is wrong with this argument before you proceed. I've told you everything you need to know. (Just for good measure, here's another clue.)

Lynn acknowledges a second problem:

"The Arctic Peoples did, however, evolve a larger brain size, approximately the same size as that of the East Asians, so it is curious that they do not have the same intelligence.

And dispenses with it by suggesting that the Inuit evolved "strong visual memory" that would have helped on hunting expeditions, but "which is not measured in intelligence tests."

Does this not begin to remind you of the Aristotelian trying to salvage the theory that heavier bodies fall faster?

Let us see how many problems with Lynn's little song-and-dance we can enumerate.

1. Lynn's argument that small population leads to low intelligence is circular. His entire thesis is that intelligence is an evolutionary adaption. Therefore, high intelligence leads to large populations, not the other way around. (Duh!)

2. If one admits that a small population can dominate the evolutionary pressure of a harsh environment and produce low intelligence even in the face of having to survive in winter, that same argument must then be applied to all of the data points for which the populations were small. So bye-bye to the bushmen and aborigines as supporting data points. You can't have it both ways. Either small populations produce reliable data (in which case the Arctic People's falsify the theory) or they do not, in which case Lynn's entire argument begins to come apart at the seams.

3. If small populations don't produce enough alleles for the evolutionary pressures of harsh environments to manifest themselves, where do those big brains come from, eh? You can't have it both ways. Either small populations don't manifest evolutionary pressures (in which case the Arctic People's large brains are a mystery) or they do (in which case Lynn's theory is falsified). Isn't it possible that the explanation for this discrepancy is that IQ tests don't accurately measure intelligence after all?

I'll leave it at that for now. There are in fact more holes in Lynn's theory than a Swiss cheese. But there is one gaping hole that dominates all the others: Lynn is postulating a simple theory for a complicated phenomenon, arguably the most complicated phenomenon in the entire Universe. All else being equal, simplicity is a virtue. But in this case all else is not equal. Some things are just complicated, and intelligence is one of them. Einstein once said that scientific theories should be "as simple as possible -- but no simpler." Lynn's theory is simpler, and therefore almost certainly wrong.

Intelligence is complicated. It is complicated to define. It is complicated to measure. It is produced by complicated processes that we are not even close to fully understanding. It is influenced by many disparate factors. Genes are undoubtedly among those factors, and it is a valid question to inquire into the extent to which genes contribute to overall intelligence (whatever that means). But -- and this is the crucial point -- Lynn does not answer that question! The reason he doesn't answer it is that he doesn't ask it. He assumes that the answer is "a lot" and goes on to ask a different question, namely, how much correlation is there between the genes that make us intelligent and the genes that make us members of our respective ethnic groups. Then, having asked the wrong question, he then goes on to make just about every mistake in the book, including collecting a mountain of data and drawing conclusions from analysis that is both post hoc and ad hoc.

I don't know what prompted James Watson to make the remarks that he did about black people, but by no stretch of the imagination are his remarks defensible as reasonable interpretations of currently available scientific data. At best, the jury is still out.

There is one final item I want to address. I can't find it at the moment, but someone left a comment on one of these posts to the effect that I "want" Lynn's theory to be wrong, that I want it to turn out that there are no racial differences in intelligence. That is true. I do hope it turns out that Lynn is wrong because I have seen the great evils that result when people believe that Lynn is right even in the absence of evidence. I think it would be a great tragedy if science were to give solace to bigots and white supremacists, and it is possible that that desire has colored or biased my evaluation of Lynn's work. I've done my best to be objective, but I am only human.

I will say (or maybe I should say "confess") that I did feel a certain sense of relief when I read Lynn's book and found it fatally flawed. There are certain inquiries for which it is wise, before they are undertaken, to think about what one is going to do with the knowledge once it is acquired, and to consider the possibility that there may be things that we would be better off not knowing.

Sunday, November 04, 2007

Science 102: epidemiology vs. controlled studies

In my previous post in this series I discussed some of the pitfalls that can lure you in to drawing false conclusions from experimental data, including confounding factors, mistakes, bad luck, and pure random chance. All those factors arise in the context of controlled experiments. In a controlled experiment there are two groups of test subjects which are made as much alike as they can be. (There is an entire industry devoted to breeding genetically identical rats for use in laboratory experiments.) These two groups are subjected to conditions that are as much alike as they can be made, save for one factor, which is the object of interest (usually a drug or some other chemical). All this effort is made in an attempt to eliminate confounding factors. It doesn't always work, and even when it does pure random chance can produce false results in a surprisingly large number of cases.

But there are many cases where a controlled experiment is not possible. We can do pretty well with animal models, but when it comes time to run experiments on humans we don't usually have access to large numbers of genetically identical test subjects. Instead, a technique called randomization is used, so that the two groups, while not identical, are unlikely to be biased in any particular direction by any confounding factor. Part of the process of designing a study is (or at least ought to be) doing the math to figure out how many test subjects you need so that randomization gives you the desired low probability of confounds.

But sometimes it is not possible to do a controlled study because controls are just too difficult to enforce. Suppose you want to know, say, if eating carrots reduces the risk of cancer. Cancer is a very slow disease, taking years to manifest itself. It would be all but impossible to rigorously enforce a protocol where one group of test subjects consumed a known quantity of carrots while a control group ate none over a period of years.

In situations like this scientists fall back on what is known as epidemiological studies. These are named after the science of epidemiology, which is, naturally, the study of epidemics (where, for obvious reasons, it is often impossible to do controlled studies). But the methodology of epidemiology can be -- and is -- applied far more broadly.

The basic idea behind epidemiology is that when you can't go through the usual process of assembling treatment and control groups, sometimes you can go back and look at people's history and assign them to the proper category retroactively. For example, we might take 1000 people and ask them if they eat carrots regularly, and then see if the ones that say they do get less cancer than the ones that say they don't.

The problem with this approach is that you don't get to randomize the two groups, and so the possibility of confounding factors is much higher. Supposed we find 500 carrot-eaters and 500 non-carrot eaters and discover that the non-carrot-eaters had 50 cancers among them while the carrot-eaters had only 40. Would we be justified in concluding that eating carrots reduces the risk of cancer by 20%?

No, we would not. For one thing, these results might not be statistically significant even in a controlled study! (Whether they would be or not depends on a lot of factors, a discussion of which would take us far afield.) But let us put that aside and assume that this is a statistically significant result. How can we be sure that carrots are the cause of the reduction in the incidence of cancer? It is possible that eating carrots coincides with other healthy lifestyle choices -- like exercising regularly for example -- and that it is exercise, not carrots, that produces the beneficial effect. In a controlled study this wouldn't be a problem because the subjects would be randomized, and presumably you'd have the same number of exercisers and non-exercisers in each group. But in an epidemiological study we do not have that luxury.

There are statistical techniques to get around this problem. Basically, the idea is to divide up a large group of people in various ways so that you essentially make "virtually randomized" treatment and control groups out of them retroactively. I don't have time to go into details, but the point is that it can be done. So supposed we ask all our test subjects: do they exercise? Do they smoke? Do they live at high altitudes? Near nuclear power plants? Near power lines? How often do they talk on their cell phones? We collect all this data and we do the statistical slice-and-dice and lo-and-behold a signal arises from all the noise that indicates with 95% confidence that indeed easting carrots does reduce the risk of cancer by 20%. Are we now justified in concluding that this is actually true?

It should come as no surprise by now that the answer is still no. The reason is that there is always the possibility that there is a confound that we might have not considered and hence forgotten to put into our questionnaire. How likely is that? The only way to be sure that it's really the carrots and not something else is to follow up with a controlled study. Let's stipulate that this is too difficult. Are we screwed?

Not completely. There are two other things we can do. One is to look at the questionnaire and see how thorough it is. If we submit the study to peer review and no one can think of anything that we should have asked about and didn't then that's a pretty good indication (though far from foolproof) that we're on the right track with the carrots. But there is another thing we can do, and this really gets at the heart of science: we can ask why eating carrots should reduce the risk of cancer.

Science is not just about doing experiments and crunching numbers. Science is really about explaining things. Experiments are not a tool for directly getting at the truth, they are a tool for helping decide between alternative explanations.

So one possible explanation (in science we call these hypotheses) of why carrots might reduce the risk of cancer is that they contain chemical substances which neutralize the effects of various carcinogens that everyone is exposed to in the course of day-to-day life. The reason that this is progress is that this explanation can be tested in ways other than feeding people carrots. For example, we can try to identify these chemicals and see if they occur in other foods, and see if eating those foods also reduces the risk of cancer. We can also try to extract or synthesize those chemicals and see if consuming them as dietary supplements makes a difference. (Turns out that often they don't.)

Have you ever wondered why you seem to hear different advice about what to eat to reduce your risk of cancer every time you turn around? It's because most of the results of epidemiological studies are wrong!

Which brings me to flamingos. As everyone knows, flamingos are pink. Famously so. The statistical correlation between being a flamingo and being pink is really off the charts. And yet a flamingo's pinkness is not genetic, except in a very roundabout sense. Flamingos are pink because their natural diet consists mainly of shrimp, which are high in beta-carotene, which has a reddish color. It's the same chemical that makes carrots orange. (Ironically, beta-carotene supplements appear to increase the risk of cancer!) The beta carotene turns their feathers pink. Feed a flamingo something other than shrimp and their feathers revert to white, their "natural" color. (The same mechanism is what makes wild salmon pink. Farm-raised salmon are white, which is why they have artificial color added to make their flesh look more "natural". Gotta love the irony.)

There's more to say on this but I have to stop now. I guess there will be a third installment of this series. If you want a sneak preview, go buy a copy of David Deutsch's book "The Fabric of Reality" and read chapters 3, 4 and 7.

Here endeth the lesson. :-)

Science 101

Suppose you did the following experiment: You take 20 test subjects and divide them into two groups of ten people each. To the first group you give an experimental drug. To the second group you give a sugar pill. One week later all of the people in the first group are dead and all of the people in the second group are alive and healthy. Is it reasonable to conclude that the experimental drug is dangerous?

Just from the information I have given you, the answer is "no". The reason is that based solely on the information I have given there are many other possible explanations for the results. Here are just a few possibilities:

1. The test subjects died because they were being taken to the test administration area in a bus, and the bus crashed.

2. The groups were not randomly selected to begin with. The test subjects were all terminally ill and the control group was all young and healthy.

3. The batch of drugs being used was accidentally contaminated by a poisonous compound during manufacture.

4. Both the test and control groups were terminally ill, the drug is safe but ineffective, and it was just random chance that all of the test group died before any of the control group.

I'll leave it as an exercise to come up with others. I've chosen these four alternative hypotheses because each one illustrates a different kind of pitfall that you can fall into when trying to apply the scientific method.

The first alternative seems like it would be easy to dispense with. If it were true there would be ample evidence: newspaper stories, photos of mangled bodies, death certificates. And yet, none of this evidence would actually "prove" that this hypothesis is true. It's possible to both fabricate and hide evidence, and so it is possible that the bus crash did occur even though no direct evidence can be found to show that it did. Likewise, it is possible that the accident did not occur even though one can produce evidence to show that it did. Eventually you have to apply Occam's razor and decide how far down the conspiratorial rabbit hole you are willing to go. The point is: nothing in science is ever absolutely proven. The best you can do is get to the point where all but one of the alternative hypotheses seem implausible to you. Ultimately, the threshold of plausibility is an individual decision.

The second alternative is an example of simple procedural error. It is a rather blatant example, but things just like this happen all the time, usually inadvertently. In fact, this kind of mistake is so common it even has a name. It's called a "confounding factor." Sometimes confounding factors can lead to serendipitous discoveries, like when it was found that copper may contribute to Alzheimer's disease. But more often confounding factors are just that: confounding, and you can't tell whether the results you got from the experiment are due to the influence you were trying to test or the confound, and you have to go back and redesign your experiment.

The third alternative hypothesis might seem farfetched, but things like this actually happen all the time too, especially in biology. Accidental contamination is like a confounding factor, except that it arises by a procedural error rather than by a mistake in the experimental design. Nowadays biological experiments rely on dozens or hundreds or in some cases thousands of reagents, and if what's in the bottle isn't what you thought was in the bottle then your results may simply be a reflection of that contamination. (A friend of mine currently working on a biology Ph.D. actually discovered a contaminated reagent in her lab which invalidated many months worth of work, not just hers, but all of her colleagues' as well. She was not a very popular person for a while.)

The fourth alternative seems the most farfetched of all, but it is not impossible. In fact, we can actually calculate the exact probability that this hypothesis is correct. (You might want to make an intuitive guess before I tell you the answer.) In the scenario I described it turns out to be quite small indeed: just a little under one in a million (1 in 1048576 to be precise). But it is extremely rare to get results this crisp. Suppose 8 of 10 test subjects had died and 3 of ten control subjects. The odds of that happening by chance are quite a bit higher. Figuring out what those odds are exactly is complicated (and occasionally controversial). The study of how to compute those kinds of odds is known as statistics and now you know why statistics are part and parcel of science, because "it happened by chance" is always an alternative hypothesis for any result.

The possibility of chance results is also the reason why only predictions are considered scientifically valid, and never postdictions. To be considered a valid test of a scientific hypothesis, you have to state the hypothesis you are purporting to test before you do the experiment, and you are not allowed to change your mind afterwards. The reason for this is that if you do enough experiments you will sooner or later get what look like interesting results purely by chance. If you are allowed to go back and cherry-pick just the experiments that gave you the results you were looking for, it is not just likely that you will fool yourself into accepting false results, it is inevitable. (This is why all day-trading schemes eventually fail. If you take historical stock data and feed it to any model with enough parameters you will inevitably find a strategy that would have made money in the past. But this is almost certain to be pure chance, and the model will almost certainly have no predictive power. This is not to say that it won't make successful predictions -- it might, but the probability that it will is almost certainly 50%. In fact, you can generate any number of models that will be successful on historical data. Half of them will be successful on future data as well. The trick is, there's no way to know which half until it's too late for that information to be useful. Of course, the same is true for models that *weren't* successful on historical data.)

This last phenomenon is actually very common, especially in today's pharmaceutical industry. Creating new drugs is so expensive that the temptation to focus on the experiments that show your drug is safe and effective is nearly impossible to resist (especially when your job description is to produce a good return for your shareholders by any means necessary). So debacles like Vioxx are not just unsurprising, they are all but inevitable.

It's even worse than that. The threshold for being considered a publishable result is what statisticians call 95% significance, that is, results for which there is less than a 5% chance that they could have come about by chance. But even if everything is done perfectly -- if there are no mistakes in the experimental design, no procedural errors, no bad luck, and no cherry-picking -- one published result in twenty is still going to be a result of chance alone! There are tens, maybe hundreds of thousands of biology studies published every year. At least one in twenty of them is almost certainly wrong.

So here's your final exam: suppose I give IQ tests to people around the world and consistently (more or less) find that white people score higher than black people. Is it reasonable to conclude that black people are genetically predisposed to have lower intelligence than white people?

A test for Lynn's theory

Fortuitously, Slate just ran a story that suggests a test for Robert Lynn's theory that intelligence differences across races are (mostly) genetic. The focus has been on Africans, which makes it difficult to design an experiment to distinguish between genetic factors, and the self-fulfilling prophecy that blacks are inferior to whites, which leads to institutionalized discrimination against black people, which leads to lower performance on standardized tests (among other deleterious effects). But happily there is another "racial" group, Ashkenazi jews, who are genetically isolated but much less visually distinctive from WASPS than Africans are. Show me a group of descendants of Ashkenazi Jews who were raised as an assimilated population in a non-Jewish community where the average IQ is normal or below, but who still as a group sport IQs one full standard deviation above the mean and I will (tentatively of course, because nothing in science is ever final) accept Lynn's conclusion.

Spineless pussies cave again

It's hardly news any more when Democrats cave to Administration pressure. I propose a new law that anyone who claims that waterboarding is not torture must be subjected to it themselves before they can be appointed to any government job -- including (make that especially) Senator.

Beware the plausible hypothesis

So I read Richard Lynn's book. He makes a rather convincing argument for the position that there are significant genetic differences in intelligence among different races. And yet, his argument is almost certainly wrong, and makes a good case study in how difficult it can be to properly interpret data. I'll start by restating Lynn's argument. In fact, I'll go Lynn one better and make an argument that is even stronger than the one he makes. Then I'll show you why it's (almost certainly) wrong.

Let us begin by observing that it is entirely uncontroversial that intelligence has a significant genetic component. Humans are more intelligent than any animal and that is clearly due to the fact that we're humans. Furthermore there are well understood genetic deficiencies in humans (like Down syndrome) that result in marked mental retardation. Furthermore, Down syndrome is also associated with distinctive and easily recognizable physical characteristics, so we have at least one example of a genetic mutation that causes changes is both intelligence and physical appearance. So the hypothesis that there are differences in intelligence among races and that those differences are genetic cannot be ruled out a priori (the way we can, say, claims of perpetual motion) since we have an uncontroversial existence proof that such phenomena do indeed occur.

Furthermore, it is uncontroversial that humans have migrated over periods of time long enough to induce genetic differences among populations. It is clear that differences in skin color is an evolutionary adaptation that has occurred in response to environmental pressures. Lactose tolerance is another very recent example, having evolved as recently as 3000 years ago so even creationists can sign on to that one :-)

Lynn begins his argument by observing that, "There is a widespread consensus that intelligence is a unitary construct that determines the efficiency of problem solving, learning, and remembering." He goes on to give a brief history of the definition of intelligence (which is too long for me to quote here), IQ, and the Flynn effect. He then devotes an entire chapter to discussing the concept of race, addressing the modern notion that the concept of race is a "myth" and showing that even those who advance that view acknowledge that races, as is intuitively obvious to everyone by the most doggedly politically correct, do indeed exist.

The rest of the book is a seemingly unassailable mountain of data showing an indisputably clear and consistent correlation between race and intelligence (as measured by IQ scores). He then shows that intelligence is almost entirely (75-80% depending on the study) a genetically inherited trait by citing studies of identical twins reared apart. Finally, he proposes a plausible mechanism (the evolutionary pressure of having to survive winter) for producing the observed differences in intelligence.

Have I convinced you? There is at least one glaring flaw in the argument above which I introduced on purpose in order to make a point. You might want to see if you can spot it before going on.

BTW, if you are convinced you shouldn't feel bad about it. It's a very convincing argument, and it might even be correct. (It might even be correct despite the flaw.) I am not aware of any data that would falsify it. But it is nonetheless almost certainly wrong. Here's why.

Let's start with my motivating example, Down syndrome, which is indeed a genetic disorder that produces both profound mental retardation and easily identifiable physical traits. However, while Down syndrome is genetic, it is not inheritable. Down syndrome results when an individual gets an extra copy of chromosome 21, which happens during meiosis or gestation. In fact, if you trust Wikipedia (which you really shouldn't, but what the hell) the non-inheritability of Down syndrome leads some researchers to use parents of children with Down syndrome as controls for studies of autism which is apparently at least partially heritable.

So the fact of the matter is that there are no precedent phenomena for the claim that there are genetic differences that result in correlated mental and physical differences within a species. (Obviously there are such differences across species.) This is not to say that it doesn't happen, just that if it happens it is not common. The a priori plausibility of the claim is much lower than the above argument would lead one to believe.

I introduced the Down syndrome argument as a deliberate red herring to make the point that even glaring flaws in an argument are not always immediately obvious even when you are primed for them. Lynn's argument has no flaws quite so glaring as that one (he's not a crackpot). Instead, Lynn's argument contains lots and lots of little flaws that together add up to an argument that is almost certainly wrong. Let's start to explore those.

The foundational flaw in Lynn's argument is the claim that intelligence "is a unitary construct", that it can be reduced to a single number (IQ), that that number can be reliably measured, and that the resulting measure has a causal relationship to something of consequence like the ability to survive winters or build industrial economies. If you read closely, he doesn't actually provide any evidence that this is the case (because there isn't any), he only says that "there is a widespread consensus." Now, this is actually true. There is indeed a widespread consensus. But just because there is a widespread consensus does not mean that it is actually true. As recently as a few decades ago there was a widespread consensus that homosexuality was a mental disease. At various times in history one could find a widespread consensus that bleeding, thalidomide, and Vioxx were effective treatments for various ailments.

In fact there is no evidence that IQ tests measure anything beyond a person's ability to do well on an IQ test. And there is a good reason for this. There are some cognitive abilities (like short term memory capacity, visual acuity, and spatial reasoning) that can be measured objectively, just as we can objectively measure certain physical abilities (like raw strength and speed). But the applicability of these abilities to real-world situations is limited. The holy grail of intelligence testing is a person's overall ability to solve novel problems, and therein lies the rub because a novel problem can only be novel once. I do really well on IQ tests but that's because I'm a geek and I spent a lot of time solving puzzles when I was a kid. I'm really good at solving puzzles, and particularly good at solving the kinds of puzzles that people tend to put on IQ tests. But it is far from clear whether my puzzle-solving skill is a result of some innate ability that I have, or whether it's simply because I've had a lot of practice. There is no way to know which way the causality runs.

It's like this for any kind of complex skill. Is Tiger Woods a superior golfer because he has the golfing gene (and does he have it because of or despite the fact that he's black), or is it simply because he's been golfing continuously since he was three? To really find out you'd need to do an experiment with a statistically significant number of subjects all going through the same intensive training that Tiger Woods went through, and even then you wouldn't really know because it's possible that one kid has a lot of innate golfing skill but just doesn't like golfing enough to really apply herself. The number of factors that go into making an effective golfer or an effective computer programmer or an effective businessman or an effective politician or even an effective puzzle solver are so diverse and varied that it would be impossible to design a study that could tease all these factors apart and give you a statistically significant result.

It gets worse. If you're testing the hypothesis that genes cause both dark skin and decreased "intelligence" (whatever that means) it is impossible to create a control group because it is impossible to hide the fact that a person with dark skin has dark skin. It is therefore impossible to eliminate the subtle effects of societal prejudices. Even if we take all of Lynn's data at face value, all we've shown is that people with dark skin don't do as well on IQ tests as people with light skin. But this could simply be due to the fact that people with light skin happen to live under circumstances that are conducive to the development of IQ-test-taking skills. And I don't just mean better economic conditions, I mean a pervasive belief shared by light-skinned and dark-skinned people alike that light-skinned people are somehow superior. Such a belief can become a self-fulfilling prophecy, and the problem with self-fulfilling prophecies is that they are actually fulfilled. Such beliefs can become self-reinforcing because they are actually true but not because of anything genetic, rather they are true simply because people believe them, rather like a mass (anti-)placebo effect.

None of Lynn's data refutes the self-fulfilling-prophecy hypothesis. Indeed, the widespread acceptance of Lynn's conclusions despite his obvious scientific sloppiness (like mistaking consensus for fact) lends support to it.

The problem is that there is no way to experimentally resolve this. To do so you would have to have a way to make black people look like white people (or vice versa), and that's just not possible. So at this point we have two equally plausible hypotheses to explain the data. How do we decide between them? And in particular, how do I justify my claim that Lynn is "almost certainly wrong"?

Well, let's look at the data. The first thing you notice is that the actual numbers are not nearly as consistent as Lynn would have you believe. To cite just one example that I found leafing through the tables, his data for Zambia are based on two studies, one of which puts the average IQ of Zambians at 77 and the other at 63, very nearly a full standard deviation of difference. And such discrepancies are not unusual.

But let's leave that aside for a moment and take Lynn's data at face value. Consider the unfortunate inhabitants of Cameroon, with an average IQ of 64. Or Equatorial Guinea with an average of 59. These figures are so low as to fall into the range of mild mental retardation. Compared to the Chinese at the opposite extreme with an average IQ of 105 the difference is more than two standard deviations. That seems like an implausibly high variance to be due to genetic factors. To be sure, differences of this magnitude in physical traits are certainly possible -- skin color comes to mind as an example -- but they are rare, and intelligence is much more complex than melanin production.

There's a lot more that could be said about this (and has been but I don't have time to write a treatise. I'll just point out one more problem with Lynn's methodology, one that is more serious than any I have cited so far.

On page 5 of Lynn's book he writes:


The metric employed for the measurements of intelligence f the races has been to adopt an IQ of 100... for Europeans in Britain, the U.S., Australia and New Zealand as the standard in terms of which the IQ's of other races can be calculated. The mean IQ's of Europeans in these four countries is virtually identical.


Notice anything oddly coincidental about those four countries? In all four, English is the native language. So Lynn's definition of intelligence, his gold standard, is the performance of English-speaking people on IQ tests. Don't you think that this definition might introduce just the teensiest bit of cultural bias into the whole endeavor? Besides the winter-induced-genetic-drift hypothesis and the self-fulfilling-prophecy hypothesis we can introduce a third plausible theory to explain Lynn's results, that "intelligence" is just a measure, at least in part, of the ability to learn to speak English. How well do the data support this theory? Well, the Chinese data seems to refute this, but how to explain e.g. the Lithuanians, who at an average IQ of 90 are significantly below their anglophonic cousins? And Lithuania is not exactly known for being a tropical paradise. If having to deal with winter makes you smart, why aren't the Lithuanians geniuses? Why are the Chinese so much smarter than the Innuit? Why are Italians smarter than the Portugese? In these last two examples the differences are huge: a full standard deviation.

The fact of the matter is that intelligence is a monstrously complex phenomenon governed by both genetics and environmental factors. The hypothesis that genetics are the dominant factor, and that adaptation for survival in winter is the driving force, is not in fact supported by the data. And in fact there is one final argument that I think puts the final nail in Lynn's argument: if intelligence is so crucial to winter survival, why do we not see the same differences in intelligence in animals? Why is it that the most intelligent land animals (great apes and elephants) evolved in Africa and not in Europe?

I am open to the possibility that the neo-eugenicists might still be correct. But until someone comes up with a much better argument than Lynn's (and Lynn's is the best they have) anyone who advances the position that blacks are genetically inferior to whites is a bigot in my book.