Saturday, February 28, 2009

Fun with numerology and the King James Bible

Not sure what inspired me to do this. Maybe it was the Hand of God. But for some reason I felt moved to see how long it would take to generate a histogram of the words in the King James Bible using a modern computer. The answer turns out to be about ten minutes of coding (in Python) and a second or two of run time.

The results have some interesting features.

The King James Bible contains:

789,634 words

31,103 verses

12,808 distinct words, of which 223 are possessives. Of those 223, all but four are also found in their root form. The four possessives that are not found in their root form are "barber's", "nachon's", "stomach's", and "fisher's". (The last one is, of course, found in the plural -- seven times in fact -- but the singular "fisher" appears nowhere in the KJV.)

4046 hapax legomena, including "onions", "cries", and "moist'.

The most common word is, naturally, "the", which occurs 63,919 times. The most common word that is not a conjunction, preposition, or pronoun is "shall" with 9837 occurrences.

The word "love" occurs 310 times and "hate" 87. "Good" and "evil" are more evenly matched at 720 and 613 respectively (with "holy" adding to the margin at 611). "Lord" appears 7830 times, "God" 4442, and "Jesus" runs a distant third at 983 plus 555 occurrences of "Christ". (Interestingly for Bible-code enthusiasts, no word appears exactly 666 times. The closest are "like" and "way" flanking the Number of the Beast at 669 and 664.) "Mary" rounds out the Catholic pantheon at 54, but not all those references are to the mother of Jesus. At least twelve of the 54 are to Mary Magdalene (I say "at least" because there are some unadorned references to "Mary" that appear to my unschooled eye to be references to MM, e.g. John 12:3), and some are to an assortment of other Mary's, like Mary the sister of Martha and Mary the wife of Cleophas. If God had really intended for us to pay a lot of attention to Mary the mother of Jesus, you wouldn't know it from the histogram.

You can explore the complete list yourself here.


Daniel Hill said...

Many thanks for doing this, Ron. Are there more words that appear exactly once than words that appear twice, or three times, or . . . ? Even if proper names are excluded? Do you have a link to the histogram itself, as opposed to the text file? Thanks for all your endeavours here.

Ron said...

I'm not sure what you mean by "the histogram itself." The text file *is* the histogram. Do you mean the code I used to generate it? I don't think I kept that. But it's a pretty elementary exercise.

Publius said...

For an interesting presentation of Zipf's Law, see this Vsauce video: The Zipf Mystery"