Thursday, September 19, 2024

Yes, you can have exactly-once delivery

Introduction

This post is ostensibly about an obscure technical issue in distributed systems, but it's really about human communications, and how disagreements that on the surface appear to be about technical issues can sometimes turn out to actually be disagreements about the meanings of words.  I'm taking the time to write a fairly extensive post about this for two reasons.  First, I'm hoping it will be helpful to provide a case-study about what appears on the surface to be a technical dispute that turns out to be a mere quibble over terminology.  Learning to recognize such situations can help defuse tense situations.  And second, this might provide some cover for someone early in their career who comes to the same conclusion I did about a certain emperor having no clothes.

The dispute

I was reading Hacker News a few days ago and stumbled on a comment posted by /u/solatic in response to an article entitled "Falsehoods programmers believe about TCP".  I quote that comment here in its entirety:

Related: you can get at most once delivery or at least once delivery; you cannot get exactly once delivery. If I had a dollar for every junior who thought that a lack of exactly once delivery guarantees was a bug...

This struck me as odd because I could see a straightforward way to implement exactly-once delivery on top of at-least-once delivery.  But solatic seemed sufficiently confident that I thought I might be missing something, so I posted what I thought was a cautious response:

If you can get at-least-once delivery, why can you not build exactly-once on top of that?

I won't go into detail about what happened after that.  You can go read the resulting thread if you like, but the TL;DR is that the discussion spun wildly out of control, which ultimately motivated me to write this post.

Before I dive into the details, I want to give my credentials because I am about to challenge what is apparently some pretty deeply entrenched conventional wisdom.  Unfortunately, many of the people who hold these mistaken views seem to subscribe to arguments from authority.  I got a lot of admonitions that were tantamount to, "You obviously need some remedial training in basic computer science."  That's possible, but I want to provide some information about my background so you can adjust your Bayesian prior regarding that particular hypothesis.

I have a Ph.D. in Computer Science.  My specialty is (or at least was) AI and not distributed systems, but I nonetheless know my way around a network stack.  My first startup, about 30 years ago now, was an (unsuccessful) attempt to launch a new network architecture.  We built our own hardware.  I wrote the Linux device drivers for our custom NIC.  I currently work part-time as a consultant for a hardware design team that builds network routing chips.  I know how a modern network works literally from the transistors on up.  And I know about the two generals and the muddy children.

So with that, let's look at the claim that I found questionable: "You can get at most once delivery or at least once delivery; you cannot get exactly once delivery."

What does "delivery" mean here?  The entire kerfuffle turns on the ambiguity of the meaning of this word, which none of my interlocutors ever defined.  But whatever it means, my puzzlement stemmed from the fact that I could see a trivial way to build exactly-once delivery from at-least-once delivery, under any reasonable definition of "delivery".  Specifically, if you could get at-least-once delivery, then turning that into exactly-once delivery should be (it seemed to me) a simple matter of keeping track of all the delivered messages and removing duplicates.  Doesn't that seem plausible to you?  If so, congratulations, your intuitions are serving you well because that will in fact work.  It's not a particularly good solution from an engineering point of view, and it is absolutely not how you would do it in practice (at least not without a lot more refinement) but it does refute the claim that you "cannot get exactly-once delivery".  You can.

So why do so many people deny this obvious truth?  Well, that's a good question.  I'm not 100% sure because no one in that HN thread ever presented a cogent argument or even a reference.  But one common theme that ran through the responses was that I needed to brush up on the two-generals problem.  So let's do that.

The two-generals problem

The Two Generals problem is a classic problem in theoretical computer science first studied by Leslie Lamport in 1983.  It was the first time ever someone proved that a particular problem in computer science had no solution.  Some people claim that if you could get exactly-once delivery you could solve the two-generals problem, but this is false.  The two problems have nothing to do with each other, despite having some superficial resemblance.

The two-generals problem is this: there are two generals (imagine that) leading two armies against a common enemy.  They can defeat their enemy only if they successfully join forces and coordinate a simultaneous attack.  But they are geographically separated in an era before modern telecommunications.  The only way they can communicate is by sending messages via courier thorough enemy territory, where they risk being captured and never heard from again.

It turns out that it is impossible for the generals to coordinate their attack under these conditions.  In order to coordinate the attack, each general needs to know that the other general has received and agreed to the same attack plan that they intend to undertake.  The technical term is that they need to achieve common knowledge, which turns out to be impossible using an unreliable communications channel.  The proof is easy to understand: if there were such a protocol, then that protocol must have a termination condition, some point at which consensus has been reached, after which no further messages need to be sent.  But the protocol must still work even if the last message in the protocol fails to arrive, and so sending the last message must not be necessary -- the protocol has to still work without it.  By induction, the protocol must work even if no messages are sent at all.  That is clearly impossible, and so our assumption that there exists such a protocol must be false, QED.

But achieving common knowledge has absolutely nothing to do with the question of whether exactly-once delivery is possible -- it is.  As I've already described above, the easiest way to do it is to keep track of all delivered messages and remove duplicates.  (Exercise: why does this strategy not succumb to the same proof of impossibility as the two-generals problem?)

Exactly-once delivery

Now, this is where things get tricky, because no one ever actually responded to this very straightforward suggestion.  So at this point I need to extrapolate the replies I did receive and guess what the response would be if I really put someone's feet to the fire.  My guess is that they would say that what I've provided is not exactly-once delivery but exactly once processing.  I've made the processing of incoming messages idempotent, so that it doesn't matter if I get duplicate delivery, but I have not actually eliminated duplicate delivery, or something like that.

It is possible to define "delivery" makes the claim that exactly-once delivery is impossible true.  The problem is that such a definition only serves to advance pedantry and ignores important subtleties in human communication.  Let's go back to the original claim: "You can get at most once delivery, or at least once delivery; you cannot get exactly once delivery."  There are some things implied here that are not explicitly mentioned.  Specifically, there is an implication that the two things that you (supposedly) *can* have -- at-most-once and at-least-once -- are actually things that one might choose under realistic circumstances.  Consider:

You can get at most once delivery, or at least once delivery, or you can hit yourself in the head with a hammer.

That statement is true -- hitting yourself in the head with a hammer is an option.  But I hope I don't have to persuade you that this is not very useful, and the only thing you're likely to accomplish by pointing it out is to annoy people.  When you provide a list of options, there is an implied social contract that the items on the list have at least some plausible utility.

In a similar vein, one way to get at-most-once delivery is simply to disconnect the communications channel.  That will absolutely guarantee that no message is ever delivered more than once, because every message will in fact be delivered exactly zero times.  Like hitting yourself in the head with a hammer, this is an option you can undertake, but it's not very useful, and so asking someone to actually spend brain cycles considering it can't be anything more than a deliberate attempt to waste their time or insult their intelligence.

Because of this, someone hearing you say "You can get at most once delivery, or at least once delivery; you cannot get exactly once delivery" is entitled to assume that disconnecting the network is not what you meant with regards to the first option.  That assumption is reinforced by the inclusion of at-least-once delivery as a viable option, because that is actually not possible unless the odds of a message being delivered are greater than zero.  Likewise, your interlocutor is entitled to assume that the reliability of the network is less than 100%, not because you explicitly said so, but because networks are unreliable in point of actual fact, and also because that is the only thing that makes this topic worth discussing at all, and the only thing that makes the impossibility of exactly-once delivery even remotely plausible.  In a 100% reliable network you can insure exactly-once delivery by (wait for it!) ... sending each message exactly once!

Likewise, "you cannot have exactly-once-delivery, but you can have exactly-once processing" implies that there is a meaningful distinction to be drawn between "delivery" and "processing".  There isn't.  Processing is ubiquitous in modern networks.  Every router does it, and obviously every end-point does it.  Any distinction between "delivery" and "processing" that would make exactly-once delivery impossible would necessarily beg the question by stipulating that discarding duplicate messages is "processing".  You can't make this distinction in any other way because discarding duplicates can happen anywhere in the abstraction stack, including the device driver, or even in the NIC hardware.  No one does it that way, but only because it's easier and more convenient to discard duplicates in the network stack or the application, not because it's impossible to do it anywhere else.

So yes, you can have exactly-once delivery.  That may not be what you want.  In fact, it is almost certainly not what you want for anything that is actually mission-critical, because a lot can go wrong downstream of message delivery.  But that is a different question.  If you want it, you can absolutely have it.

Just for the sake of completeness I should point out that removing duplicates at the receiver is a pretty extreme oversimplification of what you would do in practice to provide exactly-once delivery.  A complete solution would almost certainly be an example of Greenspun's Tenth Law applied to the TCP protocol rather than Common Lisp.

Final Thoughts

This post was intended to be about human communication more than distributed systems or network protocols, so I want to address one last question: How did this myth of the impossibility of exactly-once delivery get so firmly established?  This is a genuine mystery to me.  I can find no academic citation anywhere, only a few blog posts, none of which have any references.  So I can only speculate.  My guess is that it stems from the fact that distributed systems are monstrously complicated both in theory and in practice, and there are some things that you really cannot guarantee in the face of certain kinds of failures.  But that is true of non-distributed systems too.  Distributed systems just amplify the problems, they don't introduce them.  Even in a non-distributed system you always have the possibility that a cosmic ray will flip a bit somewhere.  Yes, you can have ECC, but you can also have two cosmic rays, or a manufacturing defect, or sabotage, or a bug.  In the real world you can never eliminate all risks.  The best you can hope for is to reduce them to the point where you like the odds.  That's usually good enough.

That is even the case for human endeavors like this blog post.  It's possible that I've overlooked something and that my argument is wrong.  But I've thought about it, and I've done some research, and I'm able to account for all the data, so I'm casting my lot and publishing this, fully aware of the possibility that someone might point something out in the comments that will make me slap myself on the forehead and go "duh".  But even then I will count that as a good outcome because I will learn something.

That's how the scientific method works.

Wednesday, September 11, 2024

Information, Data, and Knowledge

(This is part 10 in a series on the scientific method.)

In 1966, well within living memory as I write this in 2024, Digital Equipment Corporation released the PDP-10, later rebranded as the DECsystem-10 or, more colloquially, the DEC-10.  The base model cost over $100,000 in 1966 dollars, well over a million dollars today.  For that price you got 8,192 words of memory, each being 36 bits wide, or just a hair over 32 kilobytes by modern reckoning.  A top-of-the-line model gave you about 144 kilobytes (32,768 36-bit words) of memory and cost three times the base model price.  At the time, the DEC-10 was considered a "low cost" computer.  Nowadays a computer with vastly superior specifications fits on a board the size of your little finger and can be purchased at retail for less than less than one 1966 dollar.  If you buy just the chip in modest volume, you can get them for less than one 2024 dollar.

Notice that despite decades of technological improvement, the specifications of a computer's capabilities were given in essentially the same terms in 1966 as they are today, namely, by specifying the amount of memory it contains, and this will always be the case.  Memory has a utility that transcends technology and culture and the whims of fashion: it can store information.

Information is both ubiquitous and ineffable.  We live in an "information age", inundated by information around the clock, but almost no one understands what it actually is.  It's challenging to find a definition that is accessible and not circular.  Wikipedia says, "Information is an abstract concept that refers to something which has the power to inform," which doesn't seem particularly helpful or enlightening to me.

So let's set this as our Problem: there is this thing, information, that everyone talks about, that permeates our modern existence, but that no one seems to actually understand.  Why?  Is information real, an essential part of objective reality, or is it just a social construct like the law or the value of fiat currency?  What is information made of?  Can it be created and destroyed like computers and chairs, or is it conserved like mass and energy?

Let's start by taking stock of some of the (apparent) properties of this thing we're trying to explain.

Information does not exist by itself.  It is always contained in in or bound to something, like a sheet of paper or a computer memory or a DNA molecule or a brain.  This container doesn't necessarily need to be a material object made of atoms, it can be an electromagnetic wave.  But it has to be something.  There is no such thing as information in isolation, not bound to any material thing.

Despite the fact that this containment is essential, the identity of a piece of information is independent of its binding.  The document you are reading right now is in some sense "made of" information.  Most likely, you are reading it on a computer, so that information is contained in a computer memory.  It is transferred into your brain -- i.e. the information is copied -- through the act of reading.  You could likewise copy the information in this document onto a sheet of paper by printing it, or into sound waves in the atmosphere by reading it aloud.  But regardless of whether the information in this document is contained in a computer memory or a sheet of paper or your brain or the air around you, it is, at least in some sense, the same information in all four cases.

How much can information be changed and still be "the same information"?  If this document is translated into a different language, is it still "the same information"?  What about if it is compressed or encrypted?  When information is compressed or encrypted, the result bears virtually no resemblance to the original.  (In the case of encryption, that is the whole point!)  But in each case, there is still some sense in which the result is "the same" as the original.  A compressed or encrypted version of this document is still unambiguously (it would seem) a version of this document and not a version of any other document.

I'm going to put a pin in that for now and invite you to consider a different question, which will eventually lead us to the answers to the preceding ones: what is the difference between information and data?  To persuade you that there is indeed a difference, here is an example of some data:

77 72 17 56 75 22 50 76 20 49 29 16 4 61 33 71 87 65 56 92

The difference between that and all of the surrounding text should be obvious: the text conveys some sort of "meaning" while the data is "just a bunch of numbers".  There might be some meaning there, but it's not readily apparent the way it is with the surrounding text.

So the difference between "information" and "data" is that information has "meaning" and data doesn't.  But what is "meaning"?  That word seems at least as troublesome to define as "information", and so reducing "information" to "meaning" doesn't represent a lot of progress.  But in this case looks are deceiving because the meaning of "meaning" can actually be made precise.  To see how, consider another example:

55 83 56 74 55 70 55 73 56 75 54 77 54 72 55 68 54 66 52 70

At first glance, this too looks like "just a bunch of numbers" but if you look closer you might start to notice some patterns in the second list that weren't there in the first.  For example, if you consider the list as pairs of numbers, the first number is always smaller than the second.  If you look at the first number in the pair, they all fall in a fairly narrow range: 52-56.  The second number in the pairs also fall in a narrow range: 66-83.  That might be enough for you to figure out what these numbers actually are without my telling you: they are the low and high temperature forecasts for a particular location (Redwood City, California) on a particular date (September 8, 2024) according to the Apple Weather app on my iPhone.  So those numbers contain information.

The difference between data and information is that information has a referent -- it is about something.  In the case of the second example, the referent is a weather forecast.  Notice that in order to make sense of that information, to extract its meaning, requires that you know something about what numbers are, how they are represented as marks on a page (or a computer screen), and how those numbers relate to temperatures and geography and time.  But the essential feature, the thing that confers information-ness to those numbers, is that they have a referent, that they correspond to something, in this case, temperatures.

It is this correspondence that is the defining characteristic of information.  Information is a correspondence between two physical things, like marks on a page and current events, or the reading on a thermometer and the temperature of the surrounding material, or the order of base pairs in a DNA molecule and the shape of a protein.

Notice that recognizing this correspondence can require quite a bit of effort.  There is nothing at all obvious about the relationship between the shapes of numerals and temperatures, nothing about the visual appearance of "23" that corresponds to "cold", or the visual appearance of "98" that corresponds to "hot".  To see the correspondence requires going through a lot of intermediate steps, like learning what numbers are, understanding what "hot" and "cold" are, what a temperature is, and so on.  There is nothing at all obvious about the relationship of base pairs in a DNA molecule and the shapes of proteins.  That correspondence goes through a relatively straightforward mapping of base pairs to amino acids, and then a much more complicated mapping of sequences of amino acids to shapes in three-dimensional space.  There are similar processes at work in compressed and encrypted data.  To see the correspondence between compressed and uncompressed data you have to know the compression algorithm.  To see the correspondence between encrypted and unencrypted data you have to know both the algorithm and the encryption key.

There is one additional requirement for data to qualify as information: the correspondence between the information and its referent must be causal.  It can't be simply due to chance.  I could produce a plausible-looking weather forecast simply by picking numbers at random.  That forecast might even be correct if I got lucky, but it would not contain any information.  The correspondence between the forecast and the actual future temperatures has to be the result of some repeatable process, and the correlation between the information and its referent has to be better than what could be produced by picking numbers at random.

How can you be sure that some candidate information is actually information and not just a coincidence?  You can't!  The best you can do is calculate the probability that a correlation is a coincidence and decide if those odds are small enough that you are willing to ignore them.  Calculating those odds can be tricky, and it's actually quite easy to fool yourself into thinking that something is information when it's not.  For example, there is a classic scam where a stock broker will send out market predictions to (say) 1024 people.  512 of the predictions say a a stock will go up, and 512 say it will go down.  For the group that got the correct predictions he sends out a second set of predictions.  256 people get an "up" prediction, and 256 get "down".  This process gets repeated eight more times.  At the end, there will be one person who has seen the broker make correct predictions ten times in a row.  The odds of that happening by chance is 1 in 1024.  Of course it did not happen by chance, but neither did it happen because the broker actually has a way of predicting which way a stock will move.  The existence of information depends on the assumption that the universe is not trying to scam us, which seems like a reasonable assumption, but we can't actually know it for certain.

The existence of information depends on another assumption as well.  Information is a correlation of the states of two (or more) systems, but what counts as a "system"?  There are some choices that seem "natural", but only because of our particular human perspective.  A bit in a computer memory, for example, only looks like a binary system because that is how it is intended to be viewed.  In actual fact, computer memories are analog systems.  When a 0 changes to a 1, that change does not happen instantaneously.  It takes time for the electrons to move around, and while that is happening the state of that bit is neither 0 nor 1 but somewhere in between.  We generally sweep that detail under the rug when we think about computer memory, but that doesn't change the fact that the underlying reality is actually much more complicated.

Some aspects of the apparent behavior of information actually depend on this.  It appears that information can be created and destroyed.  Human creativity appears to create information, and human carelessness or malice appears to destroy it.  There's a reason that backups are a thing.  But if we view the universe as a whole, and if our current understanding of the laws of physics is correct, then information can neither be created nor destroyed.  The currently known laws of physics are deterministic and time-symmetric.  If you knew the current state of the universe, you could in theory project that forward and backward in time with perfect fidelity.  When you "destroy" information, that information isn't actually destroyed, it is merely "swept under the rug" by some simplifying assumption or other.  When you "create" new information, it is not really new, it is pre-existing information being changed into a form that we can recognize under a certain set of simplifying assumptions.

It is tempting then to say that information is in some sense not real but rather a figment of our imagination, an artifact of some arbitrary set of simplifying assumptions that we have chosen for purely economic reasons, or maybe for political or social reasons.  Choosing a different set of simplifying assumptions would lead us to completely different conclusions about what constitutes information.  And that is true.

But our simplifying assumptions are not arbitrary.  The reality of information is grounded in the fact that everything you know is bound to one particular physical system: your brain.  You can imagine taking a "God's-eye view" of the universe, but you cannot actually do it!  In order to take the God's-eye view you would need a brain capable of containing all of the information in the universe, including the information contained in your own brain.  God might be able to pull that trick off, but you can't.

Could we perhaps build a machine that can take the God's-eye view?  At first glance that would seem impossible too for the same reason: that machine would have to contain all of the information in the universe.  But the machine would itself be part of the universe, and so it would have to contain a copy of all of the information contained within the machine, including a copy of that copy, and a copy of the copy of the copy, and so on forever.

There is a clever trick we can do to get around this problem.  See if you can figure out what it is.  Here is a hint: suppose you wanted to make a backup of your hard drive, but you didn't have any extra storage.  If your drive is less than 50% full this is easy: just copy the data on the used part of the drive onto the unused portion.  But can you think of a way to do it if your drive is full?)

Notwithstanding that it might be possible in principle to build a God's-eye-view machine, there are two problems that make it impossible in practice.  First, the engineering challenges of actually collecting all of the information in the universe are pretty daunting.  You would need to know at the very least the position of every subatomic particle, including those inside planets and stars in galaxies billions of light years away.  We can't even access that information for our own planet, let alone the countless trillions of others in the universe.  And second, throughout this entire discussion I've assumed that the universe is classical, that is, that it is made of particles that exist in specific locations at specific times.  It isn't.  At root, our universe is quantum, and that changes everything in profound ways.  Quantum information has a fundamentally different character from classical information, and the biggest difference is that quantum information is impossible to copy.  If you could do it, you could build a time machine.  But that's a story for another day.

The takeaway for today is that ignorance is a fundamental part of the human condition.  Indeed, as we will see when we finally get around to talking about quantum mechanics, ignorance is actually the mechanism by which the classical world emerges from the underlying quantum reality.  (Note that this is a controversial statement, so take it with a big grain of salt for now.)  But even in a classical world, taking the God's-eye view, while possible in principle, is impossible in practice, at least for us mortals.

Can we mere mortals ever actually know anything?  No, we can't, not with absolute certainty.  We are finite beings with finite life spans and finite brains.  We can only ever be in possession of a finite amount of data, and that data will always be consistent with an infinite number of potential theories.   We can never be 100% certain of anything.  It is always possible that tomorrow we will discover that our entire existence is some kind of long con -- a simulation, maybe, running in a tiny corner of a vast data center built by some unfathomably advanced alien civilization, and the only reason that the laws of physics appear to be the same from day to day is that no one has bothered to update our software.  But who knows what tomorrow may bring?  Maybe you or I will live to see the upgrade to laws-of-physics 2.0.

But I'll give long odds against.

Thursday, August 15, 2024

The Trouble With Big Numbers

This is part of my series on the scientific method, but it's a bit of a tangent, an interlude if you will, so I'm not giving it a number.  As you will see, that will turn out to be metaphorically significant.  I'm writing this because my muse Publius raised the problem of infinity in comments on earlier installments in this series, and so I thought it would be worth discussing why these are problematic for mathematics but not for science.

(BTW, the title of this post is an allusion to something I wrote five years ago, which itself was an allusion to something I wrote fifteen years ago.  I guess I'm just good at finding trouble.)

There is an old joke that goes something like this: one cave man says to another, "I'll bet you that I can name a bigger number than you."  The second cave man responds, "You're on.  What's your number?"  The first caveman says triumphantly, "Four!"  The second cave man thinks for a while and finally says, "You win."

The joke is not just that the second cave man couldn't count to five, but that it was a silly game to begin with because second player can (it would seem) always win by simply taking the number that the first player names and adding one.  It seems obvious that you should be able to do that no matter what the first player says because otherwise there would have to exist a counter-example, a number to which it is not possible to add 1, and obviously there is no such counterexample, right?

Well, sort of.  There are systems of arithmetic where four actually is the biggest number.  For example, in modulo-5 arithmetic it is possible to add 1 to 4, but the result is zero.  In modulo-5 arithmetic, four actually is the biggest number.

But this is obviously silly, notwithstanding that modular arithmetic really is a legitimate mathematical thing with lots of practical applications.  There is obviously a number greater than four, namely five, the very number we had to deploy to describe the system in which there is no number greater than four.  In fact, to describe a system of modular arithmetic whose biggest number is N we have to use a number one bigger than N.  So this argument seems self-defeating.

There is another way to construct a system of arithmetic with a biggest number, and that is to simply stipulate that there is a biggest number, and that adding one to this number is just not allowed.  Again, this might feel like cheating, but if we are using numbers to count actual physical objects, then there is already a smallest number: zero.  So why could there not be a biggest one?

But this still feels like cheating, because if we can name the number that we want to serve as the biggest number, we can obviously (it would seem) name a number that is one more than that.  So unlike zero, which is kind of a "natural" choice for a smallest number, there is no apparent "natural" choice for a biggest number.  We can try playing tricks like "one more than the biggest number that we can actually name", but that is simply a fun paradox, not an actual number.

So it would appear that logic leaves us no choice but to accept that there is no biggest number, and so we have to somehow deal with apparently inescapable fact that there are an infinite number of numbers.  But that leads to problems of its own.

Imagine that you have three buckets, each of which is capable of holding an infinite number of balls.  Bucket #1 starts out full of balls while the other two are empty.  You now proceed to execute the following procedure:

1.  Take three balls out of bucket 1 and put them in bucket 2.

2.  Take one ball out of bucket 2 and put it in bucket 3.

3.  Repeat until bucket 1 is empty.

That third step should make you a little suspicious.  I stipulated at the outset that bucket 1 starts out with an infinite number of balls, and so if you try to empty it three balls at a time it will never be empty.  But we can fix that by speeding up the process: every time you go through the loop you have to finish it in half the time you took on the previous step.  That will let you perform an infinite number of iterations in a finite amount of time.  Again, you need to suspend disbelief a little to swallow the idea of doing every step twice as fast as the previous one, but you needed to do that when I asked you to imagine a bucket that contained an infinite number of balls in the first place, so having to deploy your imagination is already part of the game.

The puzzle is: when you finish, how many balls are in bucket #2?

The "obvious" answer is that there are an infinite number of balls in bucket #2.  For every ball that gets removed from B2 and put in B3 there are two balls left behind in B2.  So after every step there must be twice as many balls in B2 as B3.  At the end there are an infinite number of balls in B3, so there must be even more -- twice as many in fact -- left behind in B2.

And this is our first hint of trouble, because there is no such thing as "twice infinity".  If you multiply the number of counting numbers by 2 -- or any other finite number -- the result is equal to (the technical term is "can be put in one-to-one correspondence with") the number of counting numbers.

But now imagine that as we take the balls out of B1 and put them in B2 we mark them to keep track of the order in which we processed them.  The first ball gets numbered 1, the second one gets numbered 2, and so on.  Now when we pull them out in step 2, we pull them out in order: ball number 1 gets pulled out first, ball #2 gets pulled next, and so on.  If we do it this way, then bucket 2 will be EMPTY at the end because every ball will have been pulled out at some point along the way!  (In fact, we can modify the procedure to leave any number of balls in bucket 2 that we want.  Details are left as an exercise.)

So clearly things get weird when we start to think about infinity.  But actually, when dealing with large numbers, things get weird long before we get anywhere close to infinity.

There is a famously large number called a googol (the name of the Google search engine is a play on this).  It is a 1 followed by 100 zeros, i.e.:

10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Things are already getting a little unwieldy here.  Is that really 100 zeros?  Did you count them?  Are you sure you didn't miss one or count one twice?  To make things a little more manageable this number is generally written using an exponent: 10^100.  But notice that we had to pay a price for shortening a googol this way: we lost the ability to add one!  In order to write down the result of adding 1 to a googol we need to write 99 zeros:

10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001

One could argue for writing 10^100+1 instead, but that doesn't work in general.  Consider adding 1 to:

3276520609964131756207215092068230686229472208995125701975064355186510619201918716896974491640125539

which is less than a third of a googol.

But a googol is not even close to the biggest number the human mind can conjure up.  Next up is a googolplex, which is a 1 followed by a googol of zeros, i.e. 10^(10^100).  Adding one to that without cheating and just writing 10^(10^100)+1 is completely hopeless.  There are fewer than a googol elementary particles in our universe (about 10^80 in fact) so it is simply not physically possible to write out all of the digits in a googolplex.  Even if we allowed ourselves to re-use material we couldn't do it.  Our universe is only about 13 billion years old, which is less than 10^18 seconds.  The fastest conceivable physical process operates on a scale called the Planck time, which is the time it takes for light to travel the diameter of a proton, about 10^-43 seconds.  A single photon with a cycle time this short would have an energy of about 6.6 gigajoules, a little under two megawatt-hours, the energy equivalent of about one and a half tons of TNT.  So even if we could build a computer that ran this fast it could not process the digits of a googolplex in the life span of the universe.

An obvious pattern emerges here: after 10^100 (a googol) and 10^(10^100) (a googolplex) comes 10^(10^(10^100)), a one followed by a googolplex of zeros.  That number doesn't have a name, and there's not really any point in giving it one because this is clearly a Sisyphean task.  We can keep carrying on forever creating bigger and bigger "power towers": 10^(10^(10^100)), 10^(10^(10^(10^100))) and so on.

What happens if we start to write power towers with large numbers of terms, like 10^10^10^10... repeated, say, 1000 times?  To keep those from getting out of hand we have invent yet another new notation.  Designing such a notation gets deep into the mathematical weeds which I want to steer clear of, so I'm just going to adopt (a minor variant of) something invented by Donald Knuth called up-arrow notation: A↑B means A^(A^(... <== B times.  So 10↑5 means 10^(10^(10^(10^(10^10)))).  10↑10 is already too unwieldy for me to want to type out.  But even at 5 iterations it is already challenging to communicate just how vast this number is.  I can explicitly expand it out one level (10^(10^(10^(10^10000000000)))) but not two -- that would require about ten gigabytes of storage.  Expanding it out to three levels would require more resources than exist in this universe.  Four is right out.

But we are nowhere near done stretching the limits of our imagination.  Up-arrow notation can be iterated: A↑↑B means A↑(A↑(A... <== B times.  A↑↑↑B means A↑↑(A↑↑(A... <== B times, and so on.  And this allows us to arrive -- but just barely -- at one of the crown jewels of big numbers, the famous Graham's number, which uses a number of up-arrows so big that it itself needs to be described using up-arrow notation.

Graham's number is already mind-bogglingly vast, but it's a piker compared to TREE(3).  I'm not even going to try to explain that one.  If you're interested there's a lot of material about it on the web.  And even that is only just getting started.  Everything I've described so far is still computable in principle (though not in practice).  If we had a big enough computer and enough time then we could in principle actually write out all the digits of Graham's number or even TREE(3), or even TREE(TREE(3)) -- these patterns go on and on and on.  But beyond these lie even bigger numbers which are uncomputable even in principle, though it's easy to show that they must exist.  That will have to wait until I get around to discussing the halting problem.  For now you will just have to take my word for it that there are a series of numbers called busy beavers which are provably larger than any number I've described so far, or even any number that can be constructed by any combination of techniques I've described so far.  TREE(TREE(TREE.... iterated Graham's number of times?  Busy beavers are bigger, but every one of them is nonetheless finite, exactly as far away from infinity as zero is.

And it gets even crazier than that.  Let's suspend disbelief and imagine that we can actually "get to infinity" somehow.  We're still not done, not by a long shot.  It turns there are different kinds of infinity.  The smallest one is the number of counting numbers or, equivalently, the number of ways you can arrange a finite set of symbols into finite-length strings.  If you allow your strings to grow infinitely long then the number of such strings is strictly larger than the number of finite-length strings.  And you can keep playing this game forever.  The number of infinite two-dimensional arrays of symbols is strictly larger than the number of one-dimensional infinite strings of symbols, and so on and so one.

But wait, there's more!  All this is just what we get when we treat numbers as measures of quantity.  Remember the balls-and-buckets puzzle above, and how things got really strange when we allowed ourselves to paint numbers on the balls so we could distinguish one ball from another?  It turns out that if we think about adding one as not just an indicator of quantity but of position that we can squeeze different kinds of infinities in between the ones we just constructed above.  If we think of adding one as producing "the number after" rather than just the number that is "one more than" then we can introduce a number that is "the number after all the regular counting numbers".  Mathematicians call that Ï‰ (the lower-case Greek letter omega).  Then we can introduce the number after that (ω+1) and the number after that (ω+2) and so on until we get to Ï‰+ω=2ω.  And of course we can go on from there to 2ω+1, 2ω+2... 3ω, 4ω... Ï‰*ω, Ï‰^3, Ï‰^4, Ï‰^ω, Ï‰↑ω, Ï‰↑↑ω and so on until we get to Ï‰ with Ï‰ up-arrows, which mathematicians call Îµ0.  And then the whole game begins anew with Îµ0+1, Îµ0+ω, Îµ0+ε0...

These kinds of infinities are called transfinite ordinals, and they have two interesting features.  First, the "size" of each of these numbers, that is, the number of numbers between zero and any one of them, is exactly the same as the number of regular counting numbers.  If we think about numbers as referring to position, then each ordinal is "bigger" than the one before, but if we think about them as referring to quantity then each one is exactly the same "size".  And second, the game of inventing new ordinals does not have a regular pattern to it.  It requires creativity.  The bigger you get, the harder it becomes to define what "add one" means.  It gets so hard, that mathematicians who figure out how to do it have the resulting numbers named after them.

The study of big numbers, both finite and infinite, is a deep, deep rabbit hole, one that ultimately leads to Turing machines and the theory of computation, which is the deepest rabbit hole of all.  It's fascinating stuff, and well worth studying for its own sake.  But is any of this relevant for science or is it just an intellectual curiosity?

Until and unless we develop a "theory of everything" that allows us to predict the result of any experiment, we cannot rule out the possibility that this theory will involve very large numbers (by which I mean numbers that require power towers or beyond to represent), and possibly even infinities.  But so far this has not been the case.  There are only two situations in our current best theories where infinities arise, and in both of those cases there is every reason to believe that this is an indication of a problem with the theory and not a reflection of anything real.

Just in case you were wondering, those two situations are singularities inside black holes and self-interactions in quantum mechanics.  In the case of singularities, general relativity predicts their existence while at the same time predicting that they can never be observed because they always lie inside the event horizon of a black hole.  In the case of self-interactions, it just turns out that you can basically just ignore them.  If you simply throw out the infinities when they pop up, everything works out and the predictions made by the theory just turn out to be 100% accurate.  No one knows why, but that's the way it is.

But there is another situation where a kind of infinity pops up which is not so easily dismissed.

Suppose you travel exactly one mile in a straight line, then turn exactly 90 degrees and travel another mile, again in a straight line.  If you then want to return to your starting point by traveling in a straight line, how far would you have to go?

This innocuous-seeming question is the gateway to a whole series of major mathematical and philosophical problems.  You can get the answer by applying Pythagoras's theorem (which, BTW, was almost certainly not discovered by Pythagoras, but that's another story): it's the square root of 2.  The problem arises when you actually try to write this quantity out as a number.  The answer obviously has to be somewhere between 1 and 2, so it's not a whole number.  It also has to be somewhere between 14/10 and 15/10 because (14/10)^2 is 196/100, which is a little less than 2, and (15/10)^2 is 225/100, which is a little more than 2.  We can keep narrowing down this range to smaller and smaller intervals, but we can never find a ratio of two integers whose square is exactly 2.  The square root of 2 is an irrational number.  If we wanted to write it out exactly, we would need an infinite number of digits.  We haven't even gotten past the number 2 and simple multiplication, and yet somehow infinity has managed to rear its ugly head.

I opened this post with a joke, so I'll close with another: a farmer hired a physicist to design a machine to shear sheep.  After a few weeks the physicist submitted his report.  It began, "Assume a spherical sheep."

It's funny because, of course, sheep aren't spherical.  But there is a metaphorical spherical sheep hiding in our problem statement.  It's in the phrase, "exactly one mile"  (and also "exactly 90 degrees").  In order for the mathematical irrationality of the square root of 2 to matter in the physical situation I have described it really is critical to travel exactly one mile.  If you deviate from this in the slightest then the distance back to your starting point can become a rational number, representable numerically with no error in a finite number of symbols.  For example, if either leg of your journey is exactly one part in 696 longer than a mile then the return trip will be exactly 985/696 miles.

In fact, for any number you care to name, no matter how small, I can give you a smaller number such that adding that much distance to one leg of the trip will make the return distance be a rational number.  That means that if you odometer has any error at all, no matter how small, the return distance could be rational.

Of course, "could be" is not the same as "is".  It's possible that the actual underlying physical (or even metaphysical) reality is truly continuous, and actually does require an infinite number of symbols to describe.  But here is the important question: how could you ever possibly know?  What experiment could possibly demonstrate this?  In order to know whether physical reality is truly continuous you would need to somehow obtain an infinite amount of data!  To be able to tell, for example, whether our three-dimensional space is truly continuous, you would need to be able to measure a length to infinite precision.  How would you do that?  Forget the problems with actually designing such a device and how you would get around (say) thermal fluctuations and the Heisenberg uncertainty principle.  I grant you arbitrarily advanced technology and even new physics, and simply ask: what would the output of such a measuring device look like?  It can't be a digital display; those can only output a finite number of possible outputs.  So it would have to be some kind of analog output, like an old-school volt meter.  But that doesn't help either, because to read an analog gauge you still have to look at it, and your eye doesn't have infinite resolution.  And even if your eye had infinite resolution, you would still have to contend with the fact that the needle of the gauge is made of atoms which are subject to thermal fluctuations.  And if you tried to solve that problem by cooling the gauge down to absolute zero, you still ultimately have to contend with the Heisenberg uncertainty principle.  (Yes, I know I granted you new physics, but you still have to be consistent with quantum mechanics.)

The ultimate fact of the matter is that no matter how hard we try, no matter what technology we invent, no matter how many resources we deploy, we will only ever have a finite amount of data, and so we can always account for that data with finite explanations.  We can imagine infinities.  They might even pop up unbidden in our mathematical models.  But when they do, that will almost certainly be an indication that we've done something wrong because we can know for sure that neither infinite quantities nor infinite precision can ever be necessary to explain our observations.  In fact, we can calculate pretty easily the amount of data our universe can possibly contain, and it's a tiny number compared to what the human imagination is capable of conceiving.

Infinities are like wizards and unicorns.  They're fun to think about, but they aren't real.

Monday, July 29, 2024

Computation: Math and the Church-Turing Thesis

(Part 9 in a series on the scientific method)

In the last installment of this series I addressed a philosophical question: what does it mean for a mathematical statement to be true?  I tackled this in the context of a general definition of scientific truth, namely, that a statement is true if it corresponds on some way to the actual state of affairs in objective reality.

(Note that this definition does not assume that objective reality exists.  It's possible that there is no objective reality, and that therefore there are no true statements under this definition.  But the existence of objective reality is, to put it mildly, a fairly well-established scientific theory, and so we're justified in proceeding on the assumption.  Spoiler alert: when we get to talking about quantum mechanics, we will encounter some data that calls the existence of objective reality into serious question, often leading to much philosophical hand-wringing.  I'm sweeping all that under the rug for now, but I wanted to be up-front about the fact that I am sweeping it under the rug.  I promise to unsweep it later.)

In this installment I want to talk about a different aspect of math, namely, how to explain its ubiquity in science.  In this series I have been trying very hard to avoid math because it tends to scare people away.  I think that's mainly due to very poor math pedagogy.  Math need not be any scarier than science itself.  One of the things I hope to accomplish here is to demystify math a bit and make it a little less intimidating.

So here I am going to approach the phenomenon of mathematics as a scientific Problem (with a capital P), namely: why is math so ubiquitous in science?  And why is it so alien and intimidating?

Let's start by asking: what actually is math?  It's hard to formulate a coherent definition.  Wikipedia says, "Mathematics is a field of study that discovers and organizes methods, theories and theorems that are developed and proved for the needs of empirical sciences and mathematics itself."  That might be technically correct, but it's not very illuminating, so instead of trying to define it, I'm going to sweep that under the rug as well and simply observe that math, kind of like pornography, seems to be an identifiable thing that most people recognize when they see it.

One of the characteristic features of math that makes it recognizable is its use of weird symbology, funny-looking squiggles, Greek letters, non-Greek letters written in distinctive fonts (and, of course, numbers).  Math also attaches meaning to the relative spatial relationship between symbols in ways that no other form of human communication follows.  x² means something very different from x₂ which means something very different from 2x.

But all this weird symbology is just window-dressing.  None of it is essential; it is possible to do math without it.  In fact, for most of human history math was done using plain old language, augmented with an occasional drawing.  The only "weird symbols" were numbers.  You can see this by perusing the original manuscript of Newton's Principia, the book that launched the modern scientific revolution.  It consists almost entirely of (latin) text and some drawings.  There is some math, but none of the notation goes beyond what you would find in an elementary-school math class.  The symbology used today only came into being in the last few hundred years.  It was invented because doing math with regular language was becoming too unwieldy.  The state-of-the-art technology for doing math at the time was pen and paper and chalkboards, and so the notation was optimized to allow a lot of information to be compressed into a minimal amount of space.  But what matters is not the symbology, but rather what the symbols mean.  "√3" means exactly the same thing as "the square root of 3", which means exactly the same thing as, "The number which, when multiplied by itself, gives you three."  The only reason to use "√3" is that it takes up less space and is faster to write by hand.

But there is more to math than just weird symbols.  Not all combinations of symbols make sense.  "√3" makes sense but "3√" doesn't.  It's no different from any other language.  "The square root of three" makes sense, but "three root square the of" does not.  Math notation is just another language, and it has grammatical rules just like any other language.

So what exactly is it about math that makes it special?  If the symbology is just window-dressing, and the grammatical rules are no different from natural language, what exactly is it that makes math such a powerful lever for doing science?  The answer is that in addition to grammatical rules, which tell you how to compose mathematical symbols so that they say something that makes sense (but isn't necessarily true), there is another set of rules in math that don't have a counterpart in natural language.  These rules tell you how to manipulate mathematical symbols to produce new mathematical truths, arrangements of symbols that correspond in some way to objective reality, from old ones.

The most familiar example of these rules is the algorithms we learn in elementary school for doing arithmetic.  Using these rules we can know, for example, that if we take (say) a hundred and ninety-two things and combine them (for some manner of combination) with (say) three hundred and seventeen things that we will end up with five hundred and nine things.  We figure this out not by thinking about the actual words that I wrote out, nor directly about the quantities those words denote, but instead in terms of the symbols "192+317=509".  By using these symbols and the right set of rules for manipulating them, we can figure out that 192 plus 317 is 509 a lot faster than we could if we tried to actually count these things.

The art of mathematics, the thing that keeps mathematicians employed, is figuring out what the right rules are.  That is a topic for another day, with a long and storied history.  The main point here is that the thing that makes math useful in science is not its notation, not the weird symbology, but the rules, the fact that it is possible to create sets of rules for manipulating symbols such that the behavior of the symbols corresponds reliably to the behavior of objective reality.  In other words, it is possible to create sets of rules for manipulating symbols that turn the symbols into a model of objective reality.  This turns out to be much more than just a handy-dandy way of producing models; it turns out to have some truly profound philosophical implications.  To see why, we need to take a brief historical detour.

In 1936, Alan Turing published a paper entitled "On Computable Numbers, with an Application to the Entscheidungsproblem" (which is German for "decision problem").  The title obscures what this paper is really about.  In it, Alan Turing invented software.

In 1936 the words "computer" and "calculator" did not refer to machines as they do today but rather to a profession.  A computer (or calculator) was a human who performed computations (or calculations), which is to say, who manipulated mathematical symbols according to rules.  The useful results of this process were generally numbers, which is why the title of Turing's paper refers to "computable numbers", but numbers are a red herring here.  What matters is not the numbers per se, but the symbols; that the symbols can be taken to stand for numbers is a mostly irrelevant detail.  What really matters is the rules by which the symbols are manipulated.

Turing's paper asks the question: what would happen if we tried to build a machine that does what (human) computers/calculators do?  He then goes on to describe a machine that would do just that.  The machine consists of a control mechanism which can be in any of a finite number of states.  These states encode the rules by which the machine manipulates symbols.  The machine has a sheet of paper divided into a grid.  Each square of the grid contains exactly one of a finite repertoire of symbols.  The symbols can be erased and re-written by the machine.  The supply of paper is endless.  (The technical term is "unbounded", which is not quite the same as "infinite".  It's more like there is a paper factory available that makes more paper as needed, but at any given time the actual amount of paper that the machine has to work with is finite.  The distinction between infinite and unbounded won't matter much here but will turn out to be crucial later on.)

The machine's operation is dead-simple.  At any given time, its control mechanism is in one of the possible states, and is "looking" at one square of the grid on the paper.  The control mechanism is just a big lookup table indexed by state and symbol.  Each entry in the table contains just three pieces of information: a symbol to write onto the paper in the current grid square (possibly the same as the symbol that is already there), a direction in which to move on the paper to arrive at a new grid square, and a new state for the control mechanism to go into.  That's it.

I've described Turing's machine in terms of a "sheet" of paper, but Turing's original paper and the literature in general refers to a one-dimensional "tape" rather than a two-dimensional "sheet".  This turns out not to matter.  You can think of the "tape" as being a "sheet" that has been cut up into strips and glued end-to-end.  The net result is the same in both cases.  But in order to conform to tradition, I'm going to switch my terminology and start using "tape" instead of "sheet".

Note that the control mechanism is part of the machine's hardware, and once built, it cannot be changed.  If you want to make a Turing machine that performs a different task, you need to build a new machine.  Or so it would seem at a casual glance.

The remarkable thing about Turing machines is that it is possible to create a control mechanism that is universal, that is, which allows one machine to perform the same computation as any Turing machine you can imagine without changing the hardware.  How does it do this?  By describing the control mechanism of another Turing machine as symbols on the tape.  The universal machine reads the description on the tape, and then behaves as if its control mechanism were the one described on the tape rather than the (universal) control mechanism that it actually has.

At first glance this seems like an interesting intellectual curiosity, but not one with earth-shattering consequences.  From our current vantage point in a world with ubiquitous computers it might even seem a little banal.  But the computer revolution we are living in is only possible because of the existence of universal Turing machines.  Computing and calculating machines existed before 1936, but they were all custom-made for a particular task.  But if you build a (universal) Turing machine, that is the last machine you will ever have to build.  After that, anything you can imagine that can be done by any Turing machine can be done just by writing symbols to a tape, that is, by writing software.  There are only three reasons to every build new hardware: to make the machine run faster, to make it more energy efficient, or if you want to do something that cannot be done by any Turing machine.

Let's think about that third possibility for a moment.  A universal Turing machine can do anything that any Turing machine can do, but that doesn't mean that it can do anything.  In fact, Turing himself showed that there are certain things that no Turing machine can do, like solve the Entscheidungsproblem or the closely related halting problem.  Might we be able to design a completely different machine that can solve these problems?

There is a word for such a machine: it is called an oracle for the decision problem (or the halting problem, or whatever).  But think about what it would take merely to design or even describe such a machine, let alone build one.  Such a design would have to be rendered into a phyical medium somehow; it would have to be written down, or carved into stone, or spoken aloud as an oral tradition, or something.  And if the design could be rendered into a physical medium, then it could be expressed in terms of symbols.  Maybe part of the design would consist of drawings, but a drawing can also be expressed in terms of symbols, specifically, in the form of a computer program (i.e. a Turing machine) that produces that drawing on a suitable display.  Nowadays just about every image you see is actually produced in this way.

Maybe there is some way of rendering a design that cannot be reduced to symbols?  There is no way to prove that this is impossible, but if you can come up with one, it would be one of the greatest breakthroughs in the history of human intellectual endeavor.  It is hard to imagine even what a description of such a breakthrough would look like.  Think about it: all of the information your brain has access to comes through your senses, primarily seeing and hearing.  But we already know that these can be reduced to symbols because we have computers that can produce any image or sound that we can imagine.  Not only that, but we also have digital cameras and digital microphones, so computers can reproduce not only any image or sound that we can imagine, but also any image or sound that can possibly be produced in this universe.  If it exists in this universe, then we can record it or take a picture of it, and thus reproduce it on a computer.

We can stretch our imaginations even further.  Maybe we cannot design or describe a machine that does something a Turing machine cannot do, but maybe we can produce such a machine by some other means.  Maybe there exists a natural phenomenon in our universe whose behavior cannot be described by symbols.  Again, it is hard to imagine what such a phenomenon would even look like.  It would not be possible to describe such a phenomenon in terms of measurements because all measurements can be rendered as symbols.  It would even be impossible to describe such a phenomenon using words, because words are symbols.  So such a phenomenon would have to be radically different from anything in our day-to-day experience.  It is, of course, impossible to prove that there does not exist something beyond the bounds of our imaginations.  But that is what it would take to produce a machine that transcends the Turing machine.  (I'm going to start referring to such a machine as a Transcendant Turing Mahchine or TTM.)

Now, there is actually a candidate for a natural phenomenon that, at least as if this writing, transcends our current understanding, and that might therefore be a TTM, or at least a signpost on the road to a TTM: the human brain.  In particular, human brains produce the phenomenon of consciousness (at least mine does).  Consciousness is a really weird thing, devilishly difficult to render into symbols or otherwise to get any kind of a handle on.  It's hard even to demonstrate that it actually exists despite the fact that everyone professes to experience it.  Consciousness at first glance has many of the characteristics we would expect from a phenomenon that indicates something happening beyond what can be described by Turing machines.  (There is another phenomenon that some people claim transcends TMs, and that is the phenomenon of life itself.  This is just demonstrably wrong, though demonstrating it is not easy, so that will have to wait for a future installment.)

For now I want to make one more observation: the important thing about Turing machines is not just that it is possible to build them, or even that it is possible to build a universal Turing machine.  The important thing about (universal) Turing machines is that building one is not particularly difficult.  Once you start building systems of any complexity at all, it is actually hard to avoid building a universal Turing machine.  You could make one out of Legos or wood.  The ones we build for actual use are made of sand.

The fact that UTMs are so simple and yet so powerful is a deep and profound observation not just about physical reality, but about what is logically possible.  No one ever builds Turing machines according to Turing's original description except for pedagogy or as a parlor trick.  The nature of Turing machines is as much in their description in terms of symbols as in the reifications of that description we humans have built in physical reality.  The fact that it is hard for us to even imagine what a Transcendent Turing Machine might look like is a deep insight.  It is the reason that math is so ubiquitous and effective in science.

The claim that everything that can be described can be described by a Turing machine is called the Church-Turing thesis.  (The "Church" part refers to Alonzo Church.)  The Thesis is not usually stated in this way; the more common formulation is that any computable function can be computed by a Turing machine.  But if you think about it, this amounts to the same thing.  If there were a function that could be computed (whatever that might mean) but not by a Turing machine, it would mean that there existed in this universe some physical phenomenon that could "compute" (whatever that might mean) some function that a Turing machine could not.  That is to say, this phenomenon would have to exhibit some behavior that a Turing machine could not emulate.

The Church-Turing thesis cannot be proven, but a persuasive counterexample has never been demonstrated.  If it's true (and if it isn't that would be Big News) then it places some hard constraints on what is possible in our universe, and that in turn has significant philosophical consequences.  Either the human brain is a TTM or it is not.  If it is, what exactly is happening in there that enables this transcendence?  Is the same thing happening elsewhere in the universe, like maybe in the brains of dolphins or elephants?  Intelligent aliens?  LLMs?  And if the behavior of our brains can be described by a Turing machine, what does that imply for our subjective experience of consciousness and free will?  Turing machines are, after all, deterministic.  If we're nothing more than machines following a program, how can we have moral agency?

Spoiler alert: these questions have answers.  We humans are not TTMs but we have moral agency nonetheless.  I promise I will get to that.  Stay tuned.

Saturday, July 20, 2024

Sorry about the radio silence

It has been nearly two months since I last posted anything here.  This is far from my longest period of radio silence, but in this case it was right in the middle of a series I had been writing on the scientific method, and I really didn't intend to stop and lose momentum.  But two things happened.  First, I decided to write a chapter about mathematics and that turned out to be a much more difficult subject to tackle than I thought it would be, so I've been having trouble figuring out where to go from there.  And second, on June 5, my mother died.  Unlike my sister's death nearly four years ago, my mom's passing was not unexpected.  She was 88, and had been fighting cancer for 9 years.  I spoke to her two days before she died and one of the last things she said to me was, "I've had a good life."  We had no unfinished business.

However, even though my mom's death didn't come with the emotional toll that my sister's did, there is always a long list of chores that need to get done whenever someone cashed in their chips, and that has been occupying me for longer than I expected.

Anyway, just wanted to put this out there in case anyone was wondering what had become of me.

 

 

Ruth Gat 1936-2024

Friday, May 31, 2024

My (not) Twitter Account Got Hacked

My (not) Twitter (I refuse to refer to it by a single letter) account got hacked and so I got to see first-hand how utterly inadequate (not) Twitter's security measures are.  The hacker immediately changed both my password and email address so I can no longer access the account and I can't do a password reset.  Which is such an incredibly stupid design because of course anyone who breaks into an account is going to do those two things.  It's all the more infuriating because all they would need to do to fix this is, whenever someone changes their email account, to send a message to the old account saying, "Please confirm that you want to change your email account."  Or maybe just put a time delay on the change becoming effective so that when the actual owner of the account gets the notification that they have been hacked there is actually a chance that they could, you know, do something about it rather than just getting locked out immediately.

Not only am I locked out of my account, but I can't even look at my own feed to see what the hacker is posting on my behalf because (not) Twitter requires you to log in to see any of its content.  So to see what the hacker is doing, I would need to create a new account.

Not that any of this matters much.  I hardly ever used my account.  I can't remember the last time I logged in.  I think I had seven followers.  But the utter stupidity of their design still steams my clams.

[UPDATE] I reported the problem to (not) Twitter tech support and it took less than five minutes to receive this reply:

We’re writing to let you know that we’re unable to verify you as the account owner. We know this is disappointing to hear, but we can’t assist you further with accessing your account.

 In other words, if someone hacks your (not) Twitter account, you are just shit out of luck.

Thursday, May 30, 2024

Donald Trump is a Convicted Felon

The question of whether or not Republicans will nominate a convicted felon to be their candidate for President of the United States is no longer a hypothetical.  Unless they can somehow persuade him to withdraw from the race (and good luck with that) they will have no choice.  The primaries are over.  The convention will just be a rubber stamp.  And Trump's conviction will not disqualify him either from the ballot nor from office should he win.  Which is still a very real possibility.

Which is as it should be.  Because if Trump's conviction was, as he claims, a politically motivated railroading, then We The People should be able to overturn it at the ballot box.

But that's a big "if".  Remember, Trump was not convicted by a judge, he was convicted by a jury of twelve ordinary citizens who voted unanimously to convict on every single one of 34 felony counts, and they did it in three days.  It was not an acquittal.  It was not a hung jury.  It was not even close.

Think about what it would take for whoever had it in for Trump to arrange that.  Every single one of the twelve jurors would have had to vote to convict despite the prosecution not having met its considerable burden of proof beyond a reasonable doubt.  How likely is that?

One of the arguments that Trump's defense put forward was that Michael Cohen was lying, and one of the counter-arguments put forward by the prosecution was to ask the rhetorical question, why would Cohen lie?  Sure, Cohen despised Trump, which is not surprising considering that Cohen actually went to prison for the things he did at Trumps behest.  But Cohen was under oath.  Would he really risk going back to prison on a perjury charge just to service a vendetta?

On the other hand, it's pretty easy to see why Trump would lie and claim he's being railroaded.  He is not under oath.  He is not risking perjury charges by lying.  Lying his way to an election victory is pretty clearly to his own personal benefit.  He has literally built a tremendously successful career on lying and covering it up.  Why stop now?  The strategy has never failed him before.

There are only two possibilities here: either Trump was railroaded, or he was not.  If you think he was railroaded, if you really think he was innocent, and if you really think he would do a good job as President, by all means, vote for him.  But if you don't, if you think the jury got it right and Trump is actually guilty, then consider that  Trump is now an unrepentant convicted felon who has been lying about his innocence all along and continues to lie about it.  What else might he be lying about?

I get that you might be frustrated with Joe Biden.  I am too.  I think the Democrats in general have their heads shoved so far up their butts that you can't see the tips of their neckties.  I hate what Israel is doing in Gaza.  Inflation sucks (though I feel the need to point out that the alternative was to allow the economy to collapse during the pandemic).  The pronoun thing is out of control.

But Trump's conviction changes the calculus in a very important way: that he is a convicted felon is now an undeniable objective fact.  There are only two possibilities: either he was railroaded, or he is a liar.  There are no other options.  You have to choose.  If he was not railroaded, then he is a liar.  And if you decide that he is a liar, then regardless of how you feel about anything else, do you really want to let someone like that to put their finger back on the nuclear button?

Monday, May 27, 2024

Truth, Math, and Models

(Part 8 in a series on the scientific method)

In the last installment I advanced a hypothesis about what truth is, which is to say, I suggested a way to explain the broad consensus that appears to exist about truth.  That explanation was: there is an objective reality "out there", and true statements are those that in some sense correspond to the actual state of affairs in that objective reality.  This was problematic for statements like, "Gandalf was a wizard" because Gandalf doesn't actually exist in objective reality, but that was accounted for by observing that the actual meanings of sentences in natural languages often goes beyond the literal.

But there is one aspect of truth that is harder to account for, and which would appear at first glance to be a serious shortcoming of my theory: math.  Most people would consider, for example, "1+1=2" or "7 is prime" to be true despite the fact that it's hard to map those concepts onto anything in objective reality.  I can show you "one apple" or "one sheep", but showing you "one" is harder.  The whole point of numbers, and mathematics and logic in general, is to abstract away from the physical.  Numbers qua numbers do not exist in the real world.  They are pure abstraction, or at least they are supposed to be.  Mathematical truth is specifically intended not to be contingent on anything in the physical world, and so it would seem that my theory of truth fails to capture mathematical truth.

Some philosophers and religious apologists claim that it is therefore impossible to ground mathematical truth in objective reality, that the existence of mathematical truth requires something more, some ethereal realm of Platonic ideals or the Mind of God, to be the source of such truths.  It's a plausible argument, but it's wrong.  Mathematical truth can be understood purely in terms of objective reality.  Specifically, mathematics can be viewed as the study of possible models of objective reality.  In this installment I will explain what I mean by that.

There are a lot of different examples of (what is considered to be) "mathematical truth" but let me start with the most basic: elementary arithmetic.  These include mundane truths about numbers, things like "two plus three equals five" or "seven is prime."  It would seem at first glance that numbers used in this way don't refer to anything in objective reality.  I can show you two of something but I can't show you "two" in isolation.

There is an easy answer to this: numbers in common usage are not nouns, they are adjectives.  The reason I can't show you "two" without showing you two of something is the same reason I can't show you "green" unless I show you a green thing.  Adjectives have to be bound to nouns to be exhibited, but that doesn't mean that "green" does not exist in objective reality.  It does, it's just not a thing.  Green is a color, which is a property of things, but it is not itself a thing.  Likewise, "two" is not thing, it is a quantity, which is a property of (collections of) things.  And the reason that two plus three equals five is that if I have two things and I put them together with three other things the result is a quantity of things to which English speakers attach the label "five".  Likewise "seven is prime" can be understood to mean that if I have a quantity of things to which English speakers attach the label "seven" I cannot arrange those things in a complete, regular rectangular grid in any way other than the degenerate case of putting them all in a line.

But this explanation fails for straightforward extensions of the natural numbers, like negative numbers or irrational numbers or imaginary numbers.  I can show you two apples, and I can explain addition and subtraction in terms of putting groups of apples together and taking apples away, but only for positive numbers.  I cannot explain "three minus five equals negative two" in terms of starting with three apples and taking five away because that is just not physically possible.  Likewise I cannot show you a square with a negative area, and so I cannot explain the square roots of negative numbers in terms of anything physical (at least not easily).

There are two more cases where the numbers-are-adjectives theory fails.  The first is truths that involve generalizations on numbers like "There are an infinite number of primes."  That can't be explained in terms of properties of physical objects because we live in a finite universe.  There are not an infinite number of objects, so if numbers are meant to describe a quantity of a collection of actual physical objects, then there cannot be an infinite number of them either.

Finally, there are a lot of objects of mathematical study beyond numbers: manifolds, tensors, vectors, functions, groups, to name just a few.  Some of these areas of study produce mathematical "truths" that are deeply weird and unintuitive.  The best example I know of is the Banach-Tarski "paradox".  I put "paradox" in quotes because it's not really a paradox, just deeply weird and unintuitive: it is possible to decompose a sphere into a finite number of parts that can be reassembled to produce two spheres, each the same size as the original.  That "truth" cannot be explained in terms of anything that happens in objective reality.  Indeed, the reason this result seems deeply weird and unintuitive is that it appears to directly contradict what is possible in objective reality.  So the Banach-Tarski "paradox" would seem to be a counter-example to any possible theory of how mathematical truth can be grounded in objective reality.  And indeed it is a counter-example to the idea that mathematical truths are grounded in actual objective reality, but that is not news -- we already established that with the example of negative numbers and imaginary numbers.

I've already tipped my hand and told you that (my hypothesis is that) mathematics is the study of possible models of objective reality.  To defend this hypothesis I need to explain what a "model" is, and what I mean by "possible" in this context.

A model is any physical system whose behavior correlates in some way with another physical system.  An orrery, for example, is a model of the solar system.  An orrery is a mechanical model, generally made of gears.  It is the actual physical motion of the gears that corresponds in some way to the actual physical motion of the planets.

Mathematics is obviously not made of gears, but remember that mathematics is not the model, it is the study of (possible) models (of objective reality).  So the study of mechanical models like orreries falls under the purview of mathematics.  Mathematics obviously transcends the study of mechanical models in some way, but you may be surprised at how closely math and mechanism are linked historically.  Math began when humans made marks on sticks (or bones) or put pebbles in pots to keep track of how many sheep they had in their flocks or how much grain they had harvested.  (These ancient roots of math live on today in the word "calculate" which derives from the latin word "calculus" which means "pebble".)  And mathematics was closely linked to the design and manufacture of mechanical calculating devices, generally made using gears just like orreries, right up to the middle of the 20th century.

There is another kind of model besides a mechanical one: a symbolic model.  Mathematics has its roots in arithmetic which has its roots in mechanical models of quantities where there was a one-to-one-correspondence between marks-on-a-stick or pebbles-in-a-pot and the things being counted.  But this gets cumbersome very quickly as the numbers get big, and so humans came up with what is quite possibly the single biggest technical innovation of all time: the symbol.  A symbol is a physical thing -- usually a mark on a piece of paper or a clay tablet, but also possibly a sound, or nowadays a pattern of electrical impulses in a silicon chip -- that is taken to stand for something to which that mark bears no physical resemblance at all.  The familiar numerals 0 1 2 3 ... 9 are all symbols.  There is nothing about the shape of the numeral "9" that has anything to do with the number it denotes.  It's just an arbitrary convention that 9 means this many things:

@ @ @ @ @ @ @ @ @


and 3 means this many things:

@ @ @


and so on.

Not all symbols have straightforward mappings onto meanings.  Letters, for example, are symbols but in general they don't mean anything "out of the box".  You have to assemble letters into words before they take on any meaning at all, and then arrange those words into sentences (at the very least) in order to communicate coherent ideas.  This, too, is just a convention.  It is not necessary to use letters, and not all languages do.  Chinese, for example, uses logograms, which are symbols that convey meaning on their own without being composed with other symbols.  And symbols don't have to be abstract either.  Pictograms are symbols that communicate meaning by their physical resemblance to the ideas they stand for.

Mathematical symbols work more like logograms than letters.  A mathematical symbol like "3" or "9" or "+" generally conveys some kind of meaning by itself, but you have to compose multiple symbols to get a complete idea like "3+2=5".  Not all compositions of symbols result have coherent meanings, just as not all compositions of letters or words have coherent meanings.  There are rules governing how to compose mathematical symbols just as for natural language.  "3+2=5" is a coherent idea (under the usual set of rules) but "325=+" is not.

There is a further set of rules for how to manipulate mathematical symbols to produce "correct" ideas.  An example of this is the rules of arithmetic you were taught in elementary school.  The result of manipulating numerals according to these rules is a symbolic model of quantities.  There is a correspondence between strings of symbols like "967+381=1348" and the behavior of quantities in objective reality.  Moreover, manipulating symbols according to these rules might seem like a chore, but it is a lot easier to figure out what 967+381 is by applying the rules of arithmetic than by counting out groups of pebbles.

It turns out that manipulating symbols according to the right rules yields almost unfathomable power.  With the right rules you can produce symbolic models of ... well, just about anything, including, but not limited to, every aspect of objective reality that mankind has studied to date (with the possible exception of human brains -- we will get to that later in the series).

Mathematics is the study of these rules, figuring out which sets of rules produce interesting and useful behavior and which do not.  One of the things that makes sets of rules for manipulating symbols interesting and useful is being able to separate string of symbols into categories like "meaningful" and "meaningless" or "true" and "false".  Sometimes, for sets of rules that produce models of objective reality, "true" and "false" map onto things in objective reality, and sometimes they are just arbitrary labels.

The canonical example of this is Euclid's fifth postulate: given a line and a point not on that line, there is exactly one line through the given point parallel to the given line.  For over 2000 years humans believed that to be true and were vexed when they couldn't find a way to prove it.  It turns out that it is neither true nor false but a completely arbitrary choice; you can simply choose whether the number of lines through a point parallel to a given line is one or zero or infinite.  Any of those three choices leads to useful and interesting results.  As a bonus, some of them turn out to be good models of some aspects of objective reality too.

Another way of looking at it is that mathematics looks at what happens when you remove the constraints of physical reality from a set of rules that model that reality.  More often than not it turns out that when you do this, what you get is a system that is useful for modeling some other aspect of reality.  Sometimes that aspect of reality is something that you would not have even suspected to exist had not the math pointed you in that direction.

An example: arithmetic began as a set of rules for counting physical objects.  You cannot have fewer than zero physical objects.  But you can change the rules of arithmetic to behave as if you could have fewer than zero objects by introducing "the number that is one less than zero" a.k.a. negative one.  Even though that concept is patently absurd from the point of view of counting apples or sheep, it turns out to be indispensable when counting electrical charges or keeping track of financial obligations.  So is it "true" that (say) three minus five equals negative two?  It depends on what you're counting.  Is it "true" that there are an infinite number of primes?  It depends on your willingness to suspend disbelief and imagine an infinite number of numbers even though most of those could not possibly designate any meaningful quantity of physical objects in our finite universe.  It the Banach-Tarski paradox "true"?  It depends on whether or not you want to accept the Axiom of Choice.  (And if you think the AoC seems "obviously true" then you should read this.)

There are many examples of alternatives to the usual rules of numbers that turn out to be useful.  The most common example is modular arithmetic, which produces useful models of things like time on a clock, days of the week, and adding angles.  Another example is p-adic numbers, which are like modular arithmetic on steroids.  It is worth noting that in modular arithmetic, some arithmetic truths that are often taken as gospel turn out not to be true.  For example, in base 7, the square root of two is a rational number (not just rational but an integer!).

Philosophers and religious apologists often cite mathematical "truths" as somehow more "pure" than empirical truths and our ability to perceive them to be evidence of the existence of God or some other ethereal realm.  Nothing could be further from the (empirical!) truth!  In fact, all mathematical "truths" are contingent, dependant on a set of (mostly tacit) assumptions.  Even the very concept of truth itself is an assumption!

With that in mind, let us revisit the liar paradox, to which I promised you an answer last time.  I'll use the two-sentence version since that avoids technical issues with self-reference:
1.  Sentence 2 below is false.

2.  Sentence 1 above is true.
The puzzle is how to assign truth values to those two sentences.  The reason it's a puzzle is that there are two tacit assumptions that people bring to bear.  The first is the Law of the Excluded Middle: propositions are either true or false.  They cannot be both, and they cannot be neither.  A simple way to resolve the paradox is simply to discharge this assumption and say that propositions can be half-true, and that being half-true is the same as being half-false.

The second tacit assumption that makes the Liar paradox paradoxical is the assumption that the truth values of propositions must be constant, that they cannot change with circumstances.  This is particularly odd because everyday life is chock-full of counterexamples.  In fact, the vast majority of propositions that show up in everyday life depend on circumstances.  "It is raining."  "I am hungry."  "It is Tuesday."  The truth values of those all change with circumstances.  Obviously, "It is Tuesday" is only true on Tuesdays.  Why cannot the truth values of the Liar paradox do the same thing?  We can re-cast it as:
1.  At the moment you contemplate the meaning of sentence 2 below, it will be false.

2.  At the moment you contemplate the meaning of sentence 1 above, it will be true.
The truth values then flip back and forth between true and false as you shift the focus of your contemplation from one to the other.  Note that both of these solutions can also be applied to the "this sentence is true" version, where all three of "true", "false" and "half-true/half-false" produce consistent results (though of course not at the same time).

Finally, note that we can also attack the Liar paradox experimentally by building a physical model of it.  There are many ways to do this, but any physical mechanism that emulates digital logic will do.  You could build it out transistors or relays or Legos.  All you need to do is build an inverter, a device whose output is the opposite of its input.  Then you connect the output to the input and see what happens.

In the case of a relay, there is enough mechanical delay that the result will be flipping back and forth.  It will happen fast enough that the result will sound like a buzzer, and indeed back in the days before cheap transistors this is often how actual buzzers were made.  If you build this circuit out of transistors then the outcome will depend a lot on the details, and you will end up with either an oscillator or a voltage that is half-way between 1 and 0.

If you put two inverters in series and connect the final output to the initial input you will have built a latch, which will stay at whatever condition it starts out in.  This is how certain kinds of computer memory are made.

The modeling train runs in both directions.  This will become important later when we talk about information.  But that will have to wait until next time.