(Part 9 in a series on the scientific method)
In the last installment of this series I addressed a philosophical question: what does it mean for a mathematical statement to be true? I tackled this in the context of a general definition of scientific truth, namely, that a statement is true if it corresponds on some way to the actual state of affairs in objective reality.
(Note that this definition does not assume that objective reality exists. It's possible that there is no objective reality, and that therefore there are no true statements under this definition. But the existence of objective reality is, to put it mildly, a fairly well-established scientific theory, and so we're justified in proceeding on the assumption. Spoiler alert: when we get to talking about quantum mechanics, we will encounter some data that calls the existence of objective reality into serious question, often leading to much philosophical hand-wringing. I'm sweeping all that under the rug for now, but I wanted to be up-front about the fact that I am sweeping it under the rug. I promise to unsweep it later.)
In this installment I want to talk about a different aspect of math, namely, how to explain its ubiquity in science. In this series I have been trying very hard to avoid math because it tends to scare people away. I think that's mainly due to very poor math pedagogy. Math need not be any scarier than science itself. One of the things I hope to accomplish here is to demystify math a bit and make it a little less intimidating.
So here I am going to approach the phenomenon of mathematics as a scientific Problem (with a capital P), namely: why is math so ubiquitous in science? And why is it so alien and intimidating?
Let's start by asking: what actually is math? It's hard to formulate a coherent definition. Wikipedia says, "Mathematics is a field of study that discovers and organizes methods, theories and theorems that are developed and proved for the needs of empirical sciences and mathematics itself." That might be technically correct, but it's not very illuminating, so instead of trying to define it, I'm going to sweep that under the rug as well and simply observe that math, kind of like pornography, seems to be an identifiable thing that most people recognize when they see it.
One of the characteristic features of math that makes it recognizable is its use of weird symbology, funny-looking squiggles, Greek letters, non-Greek letters written in distinctive fonts (and, of course, numbers). Math also attaches meaning to the relative spatial relationship between symbols in ways that no other form of human communication follows. x² means something very different from x₂ which means something very different from 2x.
But all this weird symbology is just window-dressing. None of it is essential; it is possible to do math without it. In fact, for most of human history math was done using plain old language, augmented with an occasional drawing. The only "weird symbols" were numbers. You can see this by perusing the original manuscript of Newton's Principia, the book that launched the modern scientific revolution. It consists almost entirely of (latin) text and some drawings. There is some math, but none of the notation goes beyond what you would find in an elementary-school math class. The symbology used today only came into being in the last few hundred years. It was invented because doing math with regular language was becoming too unwieldy. The state-of-the-art technology for doing math at the time was pen and paper and chalkboards, and so the notation was optimized to allow a lot of information to be compressed into a minimal amount of space. But what matters is not the symbology, but rather what the symbols mean. "√3" means exactly the same thing as "the square root of 3", which means exactly the same thing as, "The number which, when multiplied by itself, gives you three." The only reason to use "√3" is that it takes up less space and is faster to write by hand.
But there is more to math than just weird symbols. Not all combinations of symbols make sense. "√3" makes sense but "3√" doesn't. It's no different from any other language. "The square root of three" makes sense, but "three root square the of" does not. Math notation is just another language, and it has grammatical rules just like any other language.
So what exactly is it about math that makes it special? If the symbology is just window-dressing, and the grammatical rules are no different from natural language, what exactly is it that makes math such a powerful lever for doing science? The answer is that in addition to grammatical rules, which tell you how to compose mathematical symbols so that they say something that makes sense (but isn't necessarily true), there is another set of rules in math that don't have a counterpart in natural language. These rules tell you how to manipulate mathematical symbols to produce new mathematical truths, arrangements of symbols that correspond in some way to objective reality, from old ones.
The most familiar example of these rules is the algorithms we learn in elementary school for doing arithmetic. Using these rules we can know, for example, that if we take (say) a hundred and ninety-two things and combine them (for some manner of combination) with (say) three hundred and seventeen things that we will end up with five hundred and nine things. We figure this out not by thinking about the actual words that I wrote out, nor directly about the quantities those words denote, but instead in terms of the symbols "192+317=509". By using these symbols and the right set of rules for manipulating them, we can figure out that 192 plus 317 is 509 a lot faster than we could if we tried to actually count these things.
The art of mathematics, the thing that keeps mathematicians employed, is figuring out what the right rules are. That is a topic for another day, with a long and storied history. The main point here is that the thing that makes math useful in science is not its notation, not the weird symbology, but the rules, the fact that it is possible to create sets of rules for manipulating symbols such that the behavior of the symbols corresponds reliably to the behavior of objective reality. In other words, it is possible to create sets of rules for manipulating symbols that turn the symbols into a model of objective reality. This turns out to be much more than just a handy-dandy way of producing models; it turns out to have some truly profound philosophical implications. To see why, we need to take a brief historical detour.
In 1936, Alan Turing published a paper entitled "On Computable Numbers, with an Application to the Entscheidungsproblem" (which is German for "decision problem"). The title obscures what this paper is really about. In it, Alan Turing invented software.
In 1936 the words "computer" and "calculator" did not refer to machines as they do today but rather to a profession. A computer (or calculator) was a human who performed computations (or calculations), which is to say, who manipulated mathematical symbols according to rules. The useful results of this process were generally numbers, which is why the title of Turing's paper refers to "computable numbers", but numbers are a red herring here. What matters is not the numbers per se, but the symbols; that the symbols can be taken to stand for numbers is a mostly irrelevant detail. What really matters is the rules by which the symbols are manipulated.
Turing's paper asks the question: what would happen if we tried to build a machine that does what (human) computers/calculators do? He then goes on to describe a machine that would do just that. The machine consists of a control mechanism which can be in any of a finite number of states. These states encode the rules by which the machine manipulates symbols. The machine has a sheet of paper divided into a grid. Each square of the grid contains exactly one of a finite repertoire of symbols. The symbols can be erased and re-written by the machine. The supply of paper is endless. (The technical term is "unbounded", which is not quite the same as "infinite". It's more like there is a paper factory available that makes more paper as needed, but at any given time the actual amount of paper that the machine has to work with is finite. The distinction between infinite and unbounded won't matter much here but will turn out to be crucial later on.)
The machine's operation is dead-simple. At any given time, its control mechanism is in one of the possible states, and is "looking" at one square of the grid on the paper. The control mechanism is just a big lookup table indexed by state and symbol. Each entry in the table contains just three pieces of information: a symbol to write onto the paper in the current grid square (possibly the same as the symbol that is already there), a direction in which to move on the paper to arrive at a new grid square, and a new state for the control mechanism to go into. That's it.
I've described Turing's machine in terms of a "sheet" of paper, but Turing's original paper and the literature in general refers to a one-dimensional "tape" rather than a two-dimensional "sheet". This turns out not to matter. You can think of the "tape" as being a "sheet" that has been cut up into strips and glued end-to-end. The net result is the same in both cases. But in order to conform to tradition, I'm going to switch my terminology and start using "tape" instead of "sheet".
Note that the control mechanism is part of the machine's hardware, and once built, it cannot be changed. If you want to make a Turing machine that performs a different task, you need to build a new machine. Or so it would seem at a casual glance.
The remarkable thing about Turing machines is that it is possible to create a control mechanism that is universal, that is, which allows one machine to perform the same computation as any Turing machine you can imagine without changing the hardware. How does it do this? By describing the control mechanism of another Turing machine as symbols on the tape. The universal machine reads the description on the tape, and then behaves as if its control mechanism were the one described on the tape rather than the (universal) control mechanism that it actually has.
At first glance this seems like an interesting intellectual curiosity, but not one with earth-shattering consequences. From our current vantage point in a world with ubiquitous computers it might even seem a little banal. But the computer revolution we are living in is only possible because of the existence of universal Turing machines. Computing and calculating machines existed before 1936, but they were all custom-made for a particular task. But if you build a (universal) Turing machine, that is the last machine you will ever have to build. After that, anything you can imagine that can be done by any Turing machine can be done just by writing symbols to a tape, that is, by writing software. There are only three reasons to every build new hardware: to make the machine run faster, to make it more energy efficient, or if you want to do something that cannot be done by any Turing machine.
Let's think about that third possibility for a moment. A universal Turing machine can do anything that any Turing machine can do, but that doesn't mean that it can do anything. In fact, Turing himself showed that there are certain things that no Turing machine can do, like solve the Entscheidungsproblem or the closely related halting problem. Might we be able to design a completely different machine that can solve these problems?
There is a word for such a machine: it is called an oracle for the decision problem (or the halting problem, or whatever). But think about what it would take merely to design or even describe such a machine, let alone build one. Such a design would have to be rendered into a phyical medium somehow; it would have to be written down, or carved into stone, or spoken aloud as an oral tradition, or something. And if the design could be rendered into a physical medium, then it could be expressed in terms of symbols. Maybe part of the design would consist of drawings, but a drawing can also be expressed in terms of symbols, specifically, in the form of a computer program (i.e. a Turing machine) that produces that drawing on a suitable display. Nowadays just about every image you see is actually produced in this way.
Maybe there is some way of rendering a design that cannot be reduced to symbols? There is no way to prove that this is impossible, but if you can come up with one, it would be one of the greatest breakthroughs in the history of human intellectual endeavor. It is hard to imagine even what a description of such a breakthrough would look like. Think about it: all of the information your brain has access to comes through your senses, primarily seeing and hearing. But we already know that these can be reduced to symbols because we have computers that can produce any image or sound that we can imagine. Not only that, but we also have digital cameras and digital microphones, so computers can reproduce not only any image or sound that we can imagine, but also any image or sound that can possibly be produced in this universe. If it exists in this universe, then we can record it or take a picture of it, and thus reproduce it on a computer.
We can stretch our imaginations even further. Maybe we cannot design or describe a machine that does something a Turing machine cannot do, but maybe we can produce such a machine by some other means. Maybe there exists a natural phenomenon in our universe whose behavior cannot be described by symbols. Again, it is hard to imagine what such a phenomenon would even look like. It would not be possible to describe such a phenomenon in terms of measurements because all measurements can be rendered as symbols. It would even be impossible to describe such a phenomenon using words, because words are symbols. So such a phenomenon would have to be radically different from anything in our day-to-day experience. It is, of course, impossible to prove that there does not exist something beyond the bounds of our imaginations. But that is what it would take to produce a machine that transcends the Turing machine. (I'm going to start referring to such a machine as a Transcendant Turing Mahchine or TTM.)
Now, there is actually a candidate for a natural phenomenon that, at least as if this writing, transcends our current understanding, and that might therefore be a TTM, or at least a signpost on the road to a TTM: the human brain. In particular, human brains produce the phenomenon of consciousness (at least mine does). Consciousness is a really weird thing, devilishly difficult to render into symbols or otherwise to get any kind of a handle on. It's hard even to demonstrate that it actually exists despite the fact that everyone professes to experience it. Consciousness at first glance has many of the characteristics we would expect from a phenomenon that indicates something happening beyond what can be described by Turing machines. (There is another phenomenon that some people claim transcends TMs, and that is the phenomenon of life itself. This is just demonstrably wrong, though demonstrating it is not easy, so that will have to wait for a future installment.)
For now I want to make one more observation: the important thing about Turing machines is not just that it is possible to build them, or even that it is possible to build a universal Turing machine. The important thing about (universal) Turing machines is that building one is not particularly difficult. Once you start building systems of any complexity at all, it is actually hard to avoid building a universal Turing machine. You could make one out of Legos or wood. The ones we build for actual use are made of sand.
The fact that UTMs are so simple and yet so powerful is a deep and profound observation not just about physical reality, but about what is logically possible. No one ever builds Turing machines according to Turing's original description except for pedagogy or as a parlor trick. The nature of Turing machines is as much in their description in terms of symbols as in the reifications of that description we humans have built in physical reality. The fact that it is hard for us to even imagine what a Transcendent Turing Machine might look like is a deep insight. It is the reason that math is so ubiquitous and effective in science.
The claim that everything that can be described can be described by a Turing machine is called the Church-Turing thesis. (The "Church" part refers to Alonzo Church.) The Thesis is not usually stated in this way; the more common formulation is that any computable function can be computed by a Turing machine. But if you think about it, this amounts to the same thing. If there were a function that could be computed (whatever that might mean) but not by a Turing machine, it would mean that there existed in this universe some physical phenomenon that could "compute" (whatever that might mean) some function that a Turing machine could not. That is to say, this phenomenon would have to exhibit some behavior that a Turing machine could not emulate.
The Church-Turing thesis cannot be proven, but a persuasive counterexample has never been demonstrated. If it's true (and if it isn't that would be Big News) then it places some hard constraints on what is possible in our universe, and that in turn has significant philosophical consequences. Either the human brain is a TTM or it is not. If it is, what exactly is happening in there that enables this transcendence? Is the same thing happening elsewhere in the universe, like maybe in the brains of dolphins or elephants? Intelligent aliens? LLMs? And if the behavior of our brains can be described by a Turing machine, what does that imply for our subjective experience of consciousness and free will? Turing machines are, after all, deterministic. If we're nothing more than machines following a program, how can we have moral agency?
Spoiler alert: these questions have answers. We humans are not TTMs but we have moral agency nonetheless. I promise I will get to that. Stay tuned.