Saturday, March 04, 2023

Uncomputable things: Chaitin's constant, Busy Beavers, and Kolmogorov complexity

1. Introduction

The other day I was watching this Numberphile video about (among other things) uncomputable numbers when I came across this section around the 6:50 mark where Matt Parker talks about Chaitin's constant.  Strictly speaking, this is a whole family of constants, not a single number, but that doesn't really matter.  What matters is that Chatin's constants are rare examples of numbers which can be unambiguously defined but which are provably uncomputable.  In the video Matt kind of throws up his hands when trying to explain Chatin's constant(s) and why it/they are uncomputable, but it's really not that hard to understand.  Nonetheless, I haven't found a really accessible explanation anywhere so I thought I'd take a whack at it.

Chaitin's numbers (usually denoted omega or Ω) are named after Gregory Chaitin who discovered/invented them some time in the 1980s or 90s (I have not been able to find the original publication, or even a reference to it).  In a nutshell, Ω is defined as the probability that a random computer program will eventually halt or run forever.  Obviously, the value of this number depends on what you mean by "a random computer program", and that in turn depends on the kind of computer you are writing the program for, which is the reason that this is a family of numbers, not just a single number.  But Chaitin's omegas are nonetheless invariably referred to in the singular because the basic idea behind all of them is the same, and the details don't really matter all that much.

2.  The halting problem

I'll start with a short review of the famous Halting Problem, because that is the foundation on which  Ω is built.  The Halting Problem is: given a program P and an input I, does P eventually halt when run with I as its input, or does it run forever?

It turns out that we can prove this problem cannot be solved.  But before we prove it, note that one might be suspicious it can't be solved even without a proof, because if we could solve the halting problem, then we could leverage that result to answer any question about anything that we could render as a computer program.  Want to know whether the Goldbach conjecture is true?  Or the Riemann hypothesis?  Or even the infamous P=NP?  (If you want to get an idea of just how thorny a problem that is, read the introduction of this paper.)  All you would need to do is to write a computer program that systematically searches for counterexamples and then ask if that program halts or runs forever.  If it halts, then there exists a counterexample and the conjecture is false, otherwise it's true.  A solution to the halting problem would put mathematicians and physicists permanently out of business.

Fortunately for both, you can prove the halting problem can't be solved.  This was first done by Alan Turing in 1936. Turing's original paper is challenging to read because there were no computers back then, so he had to pretty much invent the entire notion of "computer program" in a world where programmable computers did not yet exist.  (Back then a "computer" was a human being who did calculations by hand.  The word referred to a profession, not a device.)  But nowadays computer programs are ubiquitous, and we are used to thinking about bits and bytes and whatnot, and that makes the proof a lot easier.

Here is an outline of what we are going to do.  First, we are going to assume that we can solve the halting problem, that is, we're going to assume that we can write a program, which we will call H, which takes as input another program P and returns TRUE if and only if P halts, otherwise it will return FALSE.  Second, we are going to make a second program, which we will call B, which, like H, is going to take a program as input.  But instead of returning TRUE or FALSE, it is going to call H as a subroutine to determine whether the program P that has been given to it halts or not, and then B is going to do the opposite of what H says P will do, i.e. if H says that P halts, then B is going to enter an infinite loop (and hence run forever).  Finally, we are going to run B with a copy of itself as its input, and show that this leads to a contradiction, and hence that our assumption that we could write H must be false.

There are really only two details we need to fill in to turn that outline into a fully fledged proof.  The first is that we need to explain what we mean to run a program with another program as its input.  Turing's original paper spent many pages on this, but today we can simply point to the familiar fact that programs are just strings of bits, and inputs to programs are just strings of bits, and so running a program with another program as its input is no different than running a program with any other kind of input, it just so happens that the string of bits we supply as input happens to correspond to a valid program, which we will just stipulate.  (Or, if you want to be a stickler about it, we can just stipulate that invalid programs halt by producing an error.)

The second detail is a bit thornier.  As we have described it, H is a program that takes one input, a program P, and likewise B is a program that takes one input, which is also a program.  But what about P?  How many inputs does it take?  We have played a little fast-and-loose with this.  Remember, our description of H was that it "takes as input another program P and returns TRUE if and only if P halts" but we said nothing about the input to P.  Does P even take an input?  If so, where does its input come from when H tries to figure out whether or not it is going to halt?

There are several different ways to address this, but the easiest is to change the definition of H so that it takes two inputs, a program P and an input I, and returns TRUE if and only iff P halts when run on input I, and restrict ourselves to only giving H programs that take one input.

I am also going to introduce a bit of notation here: if P is a program and I is an input, then P(I) is the result of running program P on input I.  In the case where a program takes more than one input, like our redefined H, we separate them with commas, e.g. H(P, I).  So H(P, I) is TRUE if and only if P halts when run on input I, that is, if P(I) halts, otherwise it is FALSE.  (Exercise: what is H(H, H)?)

Now we can define our magic program B.  B is going to take one input, a program P, and it is going to call H as a subroutine with P as both the program to be analyzed and the input to that program.  In other words, B(P) is going to start by computing H(P, P).  If the result is TRUE (i.e. if P halts when run on a copy of itself as input) then B is going to enter an infinite loop, otherwise it will halt.

In other words, we will build B so that B(P) will halt if and only if H(P, P) is false, that is, if P(P) runs forever.  Otherwise, if H(P, P) is true, i.e. P(P) halts, B will run forever.

Now, what happens if we run B with a copy of itself as input, i.e. what happens if we try to compute B(B)?  Well, B(B) is going to start by computing H(B, B), which is a special case of H(P, I).  Recall that H(P, I) is true if and only if P(I) halts.  So H(B, B) is true if and only if B(B) halts.  But B(B) halts if and only if H(B, B) is false.  This is a contradiction, so our assumption that it was possible to write H must be false.  QED.

3.  Busy Beavers

So now that we know that the halting problem cannot be solved, we can also know that any information that would allow us to solve the halting problem must be impossible to obtain.  As an example of the kind of information that might allow this, consider the so-called busy-beaver numbers, denoted BB(n), which are defined as the largest number of steps that a computer program of length n could possibly run before halting.  For any n, BB(n) must exist, and it must be a finite integer.  Why?  Because there are only a finite number of programs of length n (in fact, at most 2^n of them), and so there are only a finite number of programs of length n that halt, and so one of them must be the one that runs the longest before halting.

And yet, if we knew the value of BB(n) then we could solve the halting problem for programs of length n.  How?  Simply by running all of the programs of length n in parallel until we ran them all for BB(n) steps!  Any program that runs longer than that must run forever.

So the BB function must be uncomputable.

4.  Chaitin's constant(s)

Another example of information that would allow us to solve the halting problem is the number of programs of length n that halt.  This number doesn't have a common name so I'm going to call them the C numbers, i.e. C(n) is the number of programs of length n that halt.  Again, C(n) is a perfectly well-defined number.  Indeed, C(n) must be an integer between 0 and 2^n, so they are not even a mind-bogglingly big number like the BB numbers are.  And yet, if we could compute C(n) we could solve the halting problem, and so C(n) must not be computable.

Note that C(n) is not a number, it's a (uncomputable) function just like BB is.   Chaitin's constant is a number that is constructed (more or less) by taking all of the C(n)'s, concatenating them together, and interpreting the result as the decimal expansion of a real number.  (And both depend on the details of the computing model, so really they are a family of functions or a family of numbers, but that is not what matters.)

If you look up Chaitin's constant you will find it is defined in terms of probabilities, specifically, the probability that something called a "prefix-free universal Turing machine" will halt on a random program, but all of this is just pedantry.  A "prefix-free Turing machine" is just a way of defining a computational model that allows you to formalize the notion of a "random program of length n", and the probability that such a program will halt is just C(n)/2^n.  Then there's some additional fancy-looking math to pack all of these rational numbers into a single real number in such a way that you can extract the C(n)'s with some more fancy math.

But all of the fancy math obscures the fact that at the end of the day, Chaitin's constant is just a numerical representation of the sequence of C(n)'s concatenated together.  In fact, if you think in binary, it is literally this.  In binary, when you divide an integer by 2^n, all you are doing is shifting that integer to the right by n digits.  Because every C(n) is at most n binary digits long, then shifting each one by n bits twice makes them all line up in a way that they don't overlap.  Then you can just fram them all together, put a decimal point on the left, and bam, you have a number which, if you knew its value, you could reconstruct the sequence of C(n)'s and hence solve the halting problem.

So all of these uncomputable things -- the busy beaver numbers, the C(n) sequence, and Chaitin's constant, are all just ways of "repackaging" the uncomputability of the halting problem.

5.  Kolmogorov complexity

Are there uncomputable numbers that can be defined without reference to the halting problem?  Yes.   Consider a computer program that produces some output, and ask: what is the length of the shortest program that produces the same output?  This question was first asked by Andrey Kolmogorov, and so the shortest program that produces a given output is called the "Kolmogorov complexity" of that output, which I will abbreviate KC.  So KC(n) is a function whose value is the length of the shortest computer program that will produce n as its output.

The proof that KC is uncomputable is also due to Gregory Chaitin, and it looks a lot like the proof that the halting problem is uncomputable.  We're going to assume that KC is computable and show how this leads to a contradiction.

So let us choose a program P that produces an output n, and assume that we can compute KC(n).  Obviously we know how long P is, an so we can tell whether or not its length is equal to KC(n).  If it is, i.e. if P is (one of) the shortest program(s) whose output is n, we will call P an elegant program.  (This is Chaitin's term.  A more neutral term would be "minimal" but I'm going to defer to the master.)

So if we can compute KC, then we can write an elegance tester, i.e. a program E which takes as input a program P and returns TRUE if that program is elegant, i.e. if its length is the same as the KC of its output.  It turns out that E is impossible in the same way that H turns out to be impossible.  To see this, we construct a new program B which works as follows: B is going to take as input some integer I, and start enumerating all programs longer than I and passing those programs to E to see if any of them are elegant.  When it finds an elegant program, it is going to run that program.

Note that B has to produce some output.  Why?  Because there are an infinite number of elegant programs, at least one for each possible output n.  And so sooner or later, B has to find one and produce the same output that it does.

Now let's run B with I set to the length of B plus one.  (Strictly speaking we have to set it a little longer than that, to the sum of the length of B plus the log base 2 of I, but that's a detail you can safely ignore.)  This means that sooner or later, B will find a program P that E says is elegant, and it will run P, and hence produce the same output n as P.  But because B only tests programs longer than B, P must be longer than B, and so P cannot be elegant because B, which is necessarily shorter, produced the same output.  (Again, strictly speaking, it's the length of B plus I that matters, but again, this is a detail.)

So again we have a contradiction, and so KC cannot be computable.

Note that this uncomputability stems from a fundamentally different source than Chatin's constant, which is really just a corollary to the halting problem.  KC has to do with optimization rather than halting, and it has some really profound practical implications.  For example, you can think of mathematical models of physics as computer programs.  The uncomputability of KC implies that we can never know if we have the optimal theory of physics.  Even if some day we manage to unify general relativity and quantum mechanics and create a theory that accounts for all observations, we cannot possibly know if we have found the simplest such theory.  That is fundamentally unknowable.