Tuesday, May 02, 2023

How to explain cardinals vs ordinals to a six-year-old

This discussion on Hacker News on whether infinity is odd or even got me to thinking about the right way to teach kids about infinity, and the difference between cardinals and ordinals.  Here's what I came up with.

It is important to realize that numbers can stand for two different kinds of ideas.  Numbers can talk about "how many" but they can also talk about "what position".  For example, we can talk about how many apples are in a bag of apples, and this lets us compare two bags of apples to decide whether one bag has more apples than the other, or whether the two bags have the same number of apples.  We can also talk about what happens when we add an apple to a bag, or take away an apple from a bag.  And this lets us define what we mean by zero: it is the number of apples in a bag from which it is not possible to remove an apple.

Now consider two bags of apples.  How can we tell if the bags have the same number of apples?   The obvious way is to count them, but suppose we don't know how to count.  Is there another way?  Yes, there is.  (See if you can figure it out.)

The way to do it is to start taking apples out of the bags two at a time, one from each bag, and stop when one of the bags is empty.  If the other bag is also empty, the two bags had the same number of apples to begin with.  That is what it means to have "the same number": for every apple in one bag, there was a corresponding apple in the other bag.

Now, what happens if we start adding apples to a bag and never stop?  It is tempting to say that we would eventually end up with a bag of infinity apples, but this is not true because if we never stop then we never end up with any particular number of apples.  We just have a bag that keeps getting fuller and fuller forever.  Is there a way to define infinity that doesn't require us to wait forever?

Yes, there is.  Remember, we have a way to tell if two bags of apples have the same number of apples (and we can do this without knowing how to count!)  So imagine if we took all possible bags of apples and grouped them together according to how many apples they had.  We take all of the one-apple bags and put them together (maybe we put them in a box instead of a bag) and all of the two-apple bags and put them together (in a second box) and so on and we do this for all possible bags of apples all at the same time.  The number of boxes we would end up with is (one kind of) infinity.

(Aside: it might seem like doing this for "all possible bags of apples at the same time" is cheating.  Why is that any better than talking about where the process of adding apples forever ends up?  It's because "forever" and "ending up" are contradictory.  Doing something to all possible bags at the same time might be physically impossible, but it is not logically contradictory.  The problem with trying to construct infinity by adding apples is that adding apples is inherently sequential.  We can't add the nth apple until after we have added the n-1'th apple.  By postulating "all possible bags of apples" we have taken the infinite bit and "parallelized" it so that the process of constructing the infinite set doesn't have an infinite chain of sequential dependencies, and so we can do it in a finite amount of "time".)

Now, instead of putting apples into bags, let's think instead about putting apples in a row.  This might seem at first like a distinction without a difference, but it's not.  When apples are in a bag, they are all jumbled together and you can't really tell one apple from another (assuming they are all the same kind of apple and the same size).  But if you put them in a row they now have an order associated with them.  So we can talk about the first apple, and the apple after the first apple (which we call the second apple) and the apple after that (third apple) and so on.

We can also go the other way and talk about the apple before (say) the third apple, which is the second apple, and the apple before the second apple, which is the first apple.  This is analogous to how we could talk about one apple more or one apple less.  But there is a huge difference between before and after versus more and less.  When we take apples out of bags, when we get to an empty bag, we have to stop.  There are no more apples to take away.  But with apples-in-rows, if we want the apple before the first apple we don't have to stop.  We can simply add an apple to that end of the row.

There is one little detail that we have to mention, and that is that to make this work we have to somehow mark the first apple so we don't lose track of it.  We could use a sharpie to write a big "1" on it, or use a granny smith as the first apple and make all the others be red-delicious or something like that.  But as long as we have a row with two ends, we can add apples to either end, and so we can go on before-and-aftering for as long as we like.  When we're adding-and-removing we are limited to removing only as many apples as we've added, after which we have to stop.

We have names for after-the-first apples: second, third, and so on.  Can we invent names for before-the-first apples?  Of course we can.  Unfortunately, the names that have been given to before-the-first apples break the pattern.  These should have been called before numbers, but in fact they are called negative numbers, or, less commonly, minus numbers.  This is really misleading because there is no such thing as negative-one apples, but there is such a thing as the-apple-that-is-two-before-the-first.  (Sometimes it seems that mathematicians are conspire to make things as confusing as they possibly can in order to maintain their job security ;-)

Note that what is important here is not so much the actual physical arrangement of apples, but rather that apples-in-a-row have a natural ordering to them which apples-in-bag don't have.  That ordering allows us to assign numbers not just to the total quantity of apples, but to each individual apple to identify where it is in that ordering.  And that very naturally leads us to a whole different kind of number (negative numbers) when we start to think in terms of before-and-after rather than less-and-more.

Note also that we can have an infinite number of after-apples, and that does not stop us from adding before-apples to the row.  In other words, when numbers are taken to stand for the order of things rather than the quantity of things, we get entirely new kinds of numbers as a result, and (and this is the really important bit) we get those additional numbers despite the fact that we started out with an infinite number of numbers!  There are an infinite number of positive numbers, but then there are an infinite number of negative numbers on top of that!

Are there even more kinds of numbers?  Yes!  Imagine an infinite row of apples that goes on forever in both directions.  We can add a new apple to that row by calling it, "The apple after all the after-apples that have (regular) numbers on them."  That's a bit wordy so it's usually abbreviated ω, which is the lower-case Greek letter omega.  (Exercise: what would you call the apple-before-all-the-before-apples-that-have-regular-numbers-on-them?)  Then we can add the apple-after-the-ω'th apple (abbreviated ω+1), the apple after that (ω+2) and so on.  Eventually you get to ω+ω=2ω, then 2ω+1, 2ω+2... 3ω, 3ω+1 and so on in a mind-boggling sequence that eventually gets you to ε0 and then ω1and then the Feferman–Schütte ordinal and the small and large Veblen ordinals.

But that's probably enough for one lesson.  Tomorrow we'll go back to bags of apples and talk about diagonalization.

Sunday, April 23, 2023

All together now: the second amendment must be repealed

It has been two years since I first called for the repeal of the second amendment.  (Someone has to be the first.)  It seems like a complete no-brainer to me that we need to at least say the obvious truth that the second amendment is a relic of the past and has no place in a modern technological society, if for no other reason than to start moving the Overton window for future generations.  It has worked spectacularly well for abortion prohibition, so why would it not work equally well for guns?

At long last someone else has stepped up to the plate.  Kirk Swearingen over at Salon has published a piece aptly entitled "The Second Amendment is a ludicrous historical antique: Time for it to go."  So kudos to Kirk.

Unfortunately, despite the assertive title, he gets a little bit mamby-pamby about it.

We're not supposed to even whisper such things because the NRA and right-wing extremists have sensible Americans — including many gun owners — so bullied and cowed that we feel we are only allowed to hope for sensible gun-safety legislation around the edges of their highly profitable assault on American lives.

 It's true.  But this has to end now.  The second amendment is quite literally an existential threat to American lives.  More Americans are killed by domestic firearms every month than died on 9/11.

Say it with me.  Say it loud.  Say it often.  Repeal the second amendment.  Repeal the second amendment.  Repeal the second amendment.  Repeal repeal repeal.  Repeat repeat repeat.  The lives of our children literally depend on it.

Friday, April 14, 2023

Bitcoin's value proposition: screwup postmortem

A Blogger user going by the handle Satoshi [1] pointed out that I made a major mistake in my analysis of rental attacks on Bitcoin.  The numbers I was using for the hash rate were off by six orders of magnitude.  But that turns out not to matter because, by sheer luck, I made a second mistake that almost exactly offset the effect of my first mistake.  I've since re-done the math, had it reviewed again by a community of bitcoin enthusiasts, and the upshot is that rental attacks are even less expensive than I originally concluded.

So how did I manage to do such a spectacular double-screwup?  Well, I got my hash rate numbers from a chart that I found on blockchain.com.  It looked like this:

Notice that the scale on the left is labelled "TH/s".  But then also notice that the numbers all have an "M" after them.  I missed those M's.

Happily for me, my analysis also relied on a number that I got from a mining rig rental site that turned out to be wrong in much the same way, but that number appeared in the denominator of the math and so the two errors more or less cancelled each other out.

For the record, here is the corrected math.

I chose whatsminer.com as a source of data for the performance numbers on current mining rigs not for any particular reason (I have never mined bitcoin so I don't really know much about the state of the art), but the site looked professional, so I assumed that its products are probably legit and competitive.  The key number from that site is that the base efficiency of their hardware is 30J/TH.

The price of electricity varies from about $0.10/kwh in China and $0.18 in the US.

The current difficulty is 48T.

The formula for converting difficulty to hashes/block is:

D * 2**256 / (0xffff * 2**208)

Setting D to 48T yields:

Python 3.8.9 (default, Mar 30 2022, 13:51:16)
[Clang 13.1.6 (clang-1316.] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> D=48
>>> D * 2**256 / (0xffff * 2**208)
>>> _/600
i.e. a hash rate of roughly 340 million TH/s, which is as expected.

The energy cost of maintaining this hash rate is:

340 M-TH/s * 30 J/TH = 10,200 MJ/s = 10 GW

Or, converting to dollars assuming low-cost electricity:

10 GW * $0.10/kWh = $1M/hr

The current block reward is 6.25 BTC/block with a market value of $180k/block ~= $1M/hr.

So this calculation passes basic sanity checks.

For completeness, let's calculate the capital costs.  Browsing mining hardware on Amazon yields a range of about $2k$5k/100TH/s.  It is a little surprising to see such a big spread (2.5x) for what should be a commodity.  But be that as it may, the bottom line is that hardware acquisition costs are about $20-50/TH/s.  So 340MTH/s would cost 7-17 billion dollars.  This seems like a plausible number because the block reward generates 364*24*$1M/hour = $9B/year, which yields a reasonable return on a O($10B) investment if you can get the operating costs low enough.

But the crucial number is the $1M/hr current run/reward rate.  For a hashing rig owner, the capital expenditure is a sunk cost, and so if they can make more money by renting than it costs to run the rig and more than they can expect to make by mining themselves, the rational choice (in the sense of economic rational actor theory) is to rent.

Note that this number is an order of magnitude less than my initial calculation, making the attack all that much more feasible.  I suspect that this is due to the fact that one of the inputs to my original calculation was a questionable data point from MiningRigRentals, and that if you crunched the numbers on their rental rates they would turn out to be (once you got your units straight) 10x what rational choice theory says it should be.  In fact, their help page includes this disclaimer:

"Bitcoin mining is ... not profitable for everyone. Therefore we strongly encourage anyone interested in mining to do his/her own research and make the calculations before investing any money to the operation.  Here at MiningRigRentals most people are speculating on the price of their mined coins..."

It seems to me that there is something very hinky about all this.  If mining is not profitable, that means you can buy coins for less than it costs to mine them, so why not just do that if you want to speculate on future price?  And that applies not just to renting, but to regular mining as well.  The operating costs of mining appear to be just about break-even even with cheap electricity and ignoring capital costs.  So why would any rational actor choose to mine?  Mining is either immediately profitable or it is not.  If it is not, then a rational actor would either rent their hardware to a greater fool, or, if market rates didn't cover the operating costs, pull the plug and use the savings on their utility bill to buy coins instead.  Any long-term deviation from this equilibrium cannot be the result of rational actors, so either rental attacks are plausible, or bitcoin's long-term security depends on systemic deviation from selfish rationality.


[1] When I looked up Satoshi yesterday his (her?) profile indicated that they had been on Blogger since 2012 but their profile had only four views.  (Today they are up to 23.)  That is an extraordinarily long run of stealth.  It is extremely unlikely, but not entirely implausible, that this person might actually be Satoshi Nakamoto.

Wednesday, April 12, 2023

A systematic critique of Bitcoin's value proposition

1. Introduction

This essay was originally entitled "Bitcoin's design contains the seeds of its own destruction".  The thesis was going to be that Bitcoin's security depends entirely on consuming vast quantities of energy, and so any value it might offer is outweighed by its inherent costs.  But when I did the math, that turns out not to be true.  Bitcoin does use a lot of energy, but not nearly as much as I initially thought.  Unfortunately, this is not necessarily good news.  Bitcoin's security is directly proportional to the cost of mining, so the less energy it uses, the less secure it is.  It turns out that there is a plausible attack against bitcoin that could be carried out for just a few million dollars, a sum which is easily within reach not just for state actors and corporations, but also many high-net-worth individuals.

This essay is divided into four sections.  In the first I'm going to review what Bitcoin's value proposition was intended to be.  In the second, I review how bitcoin works.  If you are already familiar with Bitcoin you will find nothing new here.  In the third section I analyze its security model, specifically the cost of mounting a 51% attack on the assumption that hash power is available for rent and doesn't need to be purchased by the attacker.  In the fourth section I discuss the plausibility of carrying out such an attack in the real world, and various counter-arguments that have been presented to me in private discussions.  The bottom line is that when push comes to shove, bitcoin's security ultimately rests on the same foundation as fiat currencies: social cooperation.  The idea that Bitcoin is something fundamentally new, i.e. a currency whose integrity rests on mathematical algorithms and the laws of physics and economics, is thus called into question.

2. Bitcoin's ostensible value proposition

Bitcoin was the first so-called "cryptocurrency", a particular kind of digital currency that relies on cryptographic algorithms rather than a trusted third party to maintain its integrity.  The original Bitcoin paper by Satoshi Nakamoto (a pseudonym whose real identity remains a closely guarded secret) set forth the following rationale for its creation:

"Commerce on the Internet has come to rely almost exclusively on financial institutions serving as trusted third parties to process electronic payments. While the system works well enough for most transactions, it still suffers from the inherent weaknesses of the trust based model. Completely non-reversible transactions are not really possible, since financial institutions cannot avoid mediating disputes. The cost of mediation increases transaction costs, limiting the minimum practical transaction size and cutting off the possibility for small casual transactions, and there is a broader cost in the loss of ability to make non-reversible payments for non- reversible services. With the possibility of reversal, the need for trust spreads. Merchants must be wary of their customers, hassling them for more information than they would otherwise need. A certain percentage of fraud is accepted as unavoidable. These costs and payment uncertainties can be avoided in person by using physical currency, but no mechanism exists to make payments over a communications channel without a trusted party."

In other words, the usual methods of mediating electronic commerce using a trusted third party (TTP) are deficient because 1) transactions can be reversed, 2) the cost of the TTP is too high, 3) TTP's cannot eliminate fraud, and, as a result, 4) small transactions are not economical.

There is an additional feature of Bitcoin which is described in section 6 of Satoshi's paper.  That section is only three paragraphs long, but its importance vastly outstrips its length.  I quote it here in its entirety:

"By convention, the first transaction in a block is a special transaction that starts a new coin owned by the creator of the block. This adds an incentive for nodes to support the network, and provides a way to initially distribute coins into circulation, since there is no central authority to issue them. The steady addition of a constant of amount of new coins is analogous to gold miners expending resources to add gold to circulation. In our case, it is CPU time and electricity that is expended.

"The incentive can also be funded with transaction fees. If the output value of a transaction is less than its input value, the difference is a transaction fee that is added to the incentive value of the block containing the transaction. Once a predetermined number of coins have entered circulation, the incentive can transition entirely to transaction fees and be completely inflation free.

"The incentive may help encourage nodes to stay honest. If a greedy attacker is able to assemble more CPU power than all the honest nodes, he would have to choose between using it to defraud people by stealing back his payments, or using it to generate new coins. He ought to find it more profitable to play by the rules, such rules that favour him with more new coins than everyone else combined, than to undermine the system and the validity of his own wealth."

(Side note: the British spelling of "favour" might be a clue to Satoshi's identity :-)

So Bitcoin ostensibly offers the following value proposition: 1) a non-inflatable currency with 2) irreversible transactions, leading to 3) reduced fraud and 4) lower transaction costs (because you no longer need to pay a TTP) and, as a corollary 5) making practical small transactions which are too costly under the TTP model.

I believe that all of these claims can be called into question, but I'm going to save most of my critique to the end and focus first on Bitcoin's security because that dominates all other considerations.  If Bitcoin is not secure, if it is vulnerable to an attack that undermines the integrity of the block chain, that dominates all other considerations.  Even if all of the other claims are true, it doesn't much matter if the whole system can be blown to smithereens at any time.

I'm going to start by briefly reviewing Bitcoin's security model for the benefit of my less-technical readers.  If you are already familiar with how Bitcoin works under the hood feel free to skip the following section.  You will find nothing new there.

3. The Security Model

Without a TTP, how do you insure the integrity of the system?  Specifically, how do you guarantee that everyone agrees how many bitcoins each participant in the system owns, and how do you enforce the limit on creating new coins?

Bitcoin's answer to this consists of three main components: digital signatures, a block chain (also known as a Merkle tree), and mining.

A digital signature is a little snippet of data that is associated with a document and another little snippet of data called a secret key.  Digital signatures have two key (no pun intended) properties: first, they are easy to generate, but only if you know the secret key, otherwise it is essentially impossible.  And second, it is easy for anyone to verify that a signature was in fact generated by someone who knows the secret key.  Furthermore (and this is the real magic) they can do this verification without knowing the secret key.  This technology dates back to the 1970s, though the particular version used by Bitcoin is more recent.

Bitcoin transactions are authorized by digital signatures.  You can think of a secret key as corresponding to a checking account – in bitcoin-speak these are called "wallets".  A bitcoin transaction is a digital document that says "Move X coins from wallet X to wallet Y" and is signed using the secret key corresponding to wallet X.  The important upshot of this is that control of the coins in a wallet is determined entirely by knowing the secret key.  If someone steals the secret key, they can (and almost certainly will) steal the coins in that wallet.  Likewise, if a secret key is lost, any coins in the corresponding wallet are irretrievably lost.

Digital signatures by themselves are not enough to insure the integrity of the system because nothing prevents someone from signing transactions on a wallet that total more money than it contains.  This is the so-called "double-spend" problem, though this is a bit of a misnomer.  A more accurate name would have been the "overdraft" problem, but double-spend is firmly established terminology, so I will use it here.

To prevent double-spending, bitcoin transactions are assembled into a ledger that sorts the transactions into a (partial) order.  This ledger is the so-called block-chain, and it is called that because transactions are first collected into batches called "blocks" and then the blocks are strung together in a chain.  If someone wants to verify that a transaction is valid, i.e. that the wallet that the transaction sources its funds from actually contains those funds, they can consult the block chain to see that wallet's current balance.  Again, there are cryptographic protocols in place to insure that no one can meddle with the block chain once it is established.  Like digital signatures, this is not new.  The technical term for a block chain is a Merkle tree, after Ralph Merkle who first published the idea in 1979.

The main innovation in Bitcoin's design is mining, which was derived from an earlier scheme called HashCash.  The details don't matter much.  The name derives from the fact that it involves a particular kind of computation called "hashing", which allows you to construct computational problems that are very hard to solve, in fact, so hard that the most effective way of solving them is to simply try solutions more or less at random until you happen to stumble on one that works.  Once you have a solution in hand, it is easy for anyone to verify that it is in fact a solution.  The puzzles can be constructed in a way that is specific to a particular document, so if you have a solution to one of these puzzles for a document, it proves that you (or someone) spent a lot of computing power constructing it.

The original idea behind HashCash was to use it as an anti-spam measure: email senders would include a solution to a difficult-to-solve puzzle bound to the contents of the email they were sending as proof that they had expended a lot of computational effort to send that email, and so it was less likely to come from a spammer.  Bitcoin's innovation is to take this idea and turn it into a digital lottery: whoever is the first to solve one of these difficult puzzles wins the lottery, and gets to decide which block of transactions become the next official block in the block chain.  They also get to include a transaction that creates some bitcoins out of thin air and deposits them in a wallet of their choice (presumably one whose secret key they control).  Anyone can participate in this lottery.  The more computing power they throw at it the more likely they are to win.  Conversely, the more computing power everyone else throws at it, the less likely they are to win.  In this way, the decisions about which transactions to include are (one hopes) made by different entities at different times, and no one party ever has the power to pull shenanigans, at least not for very long.

It should be noted that although distributing the block chain in this way is bitcoin's central innovation, most of bitcoin's claimed benefits accrue not because the block chain is distributed but rather because it is public.  A TTP could maintain a public block chain, and this would have almost all of the benefits of a distributed block chain.  Transactions would still be irreversible, the currency could be made non-inflatable, etc.  The only power that a TTP maintaining a public block chain would have is the ability to censor transactions, i.e. to refuse to record them.  But even this could be addressed by having a side-channel for publishing transactions which, if they lingered too long without being recorded, would damage the TTP's reputation.  Bitcoin actually has a similar feature built in called the "mempool", a collection of all transactions that have been submitted but not yet mined.

The only remaining problem with a TTP is how to compensate them for their services.  A TTP managing a block chain is necessarily a monopoly and deciding who gets to control that monopoly is a thorny political problem.  But censorship and compensation are the only two problems that mining actually solves.

4. Fifty-one-percent attacks

It is possible (though extremely unlikely) for two people to win the bitcoin lottery at more or less the same time.  In a situation like that the conflict is resolved in the next round of the lottery.  Every time you buy a bitcoin lottery ticket you have to decide ahead of time which of several possible competing blocks in the ledger you want to extend.  Conflicts are eventually resolved by a simple rule: among sets of competing blocks, the longest chain of blocks is the One True Chain.  So even if by chance two (or more) people should get winning tickets at more or less the same time, the odds of this happening again on the next round are very small, and the odds of it happening over an extended period of time by pure chance asymptotically approach zero.  Sooner or later, an unambiguous winner will emerge.  In actual practice, the system is designed to produce a winner (and hence a new block) about every ten minutes.  If there is any doubt about which of several competing chains is the One True Chain, that will almost certainly resolve itself within an hour or so.  This is the reason you will often see references to how many confirmations a bitcoin transaction has to have before it is considered valid.  The more confirmations, i.e. the deeper a transaction is in the chain, the more likely it is to be part of what ultimately turns out to be the One True Chain.

There is, however, a fly in the ointment.  Someone could attempt to intentionally disrupt the system by deploying enough computing power to extend an alternate chain.  This is called a "51% attack" because the attacker would have to control at least 51% of the computing power being devoted to buying bitcoin lottery tickets around the world, and this would be very expensive.  How expensive?  That turns out to be the crux of the matter.

The bitcoin algorithm is very cleverly designed to keep the cost of lottery tickets and the odds of winning very carefully balanced so that a winning ticket appears about every ten minutes, independent of how much computing power is being thrown at it around the world.  If, on any given round, a winner appears much sooner than the target ten minutes, the odds of winning the next round are adjusted to make it harder to win.  Likewise, if it takes much longer than ten minutes, the odds are dialed back down.

Because bitcoins can be traded for actual goods and services, including traditional fiat currencies, buying bitcoin lottery tickets can be a profitable enterprise.  As of this writing (April 2023), one bitcoin is worth about $27,000 and a winning lottery ticket gives you 6.25 of them, or about $170,000.  That amount is awarded every ten minutes on average, so there is some pretty serious money at stake.  If someone can mount a 51% attack for less than $170,000/ten minutes or $1M/hour, it becomes a profitable enterprise.  That is not a huge sum by the standard of governments, large corporations, and many high-net-worth individuals.

However, to mount an attack you not only need to pay the on-going operating cost of the computing hardware (which is mostly the cost of electricity), but you need to *acquire* that hardware.  The capital expenditure of buying enough hardware to mount a 51% attach is around 25 billion USD at current rates, and that is an amount that cannot be casually spent even by affluent governments.  However, it should give one a certain amount of pause because if, say, the Chinese or US government decided to squash Bitcoin, they absolutely could.

(Aside: one of the main uses of bitcoin is to move large sums out of countries that restrict the outflow of capital, e.g. China.  So it is not at all out of the question that the Chinese government might some day decide to take some decisive action to stop this.  Indeed, China has already taken steps in this direction, though to date they have been mostly ineffective.)

5. Rental attacks

The cost of a 51% attack drops dramatically if you can rent the necessary hardware rather than buy it.  Bitcoin mining hardware is available for rent.  Would carrying out a 51% attack on rented hardware be possible?  Would it be practical?  A back-of-the-envelope calculation indicates that the answer to both of these questions is "yes", indeed, that it might be even worse than possible and practical, it might even be profitable.

I'm going to describe that calculation here in broad brushstrokes, but to make it more concise I'm going to revert to technical terminology and talk about "hashes" rather than lottery tickets.  The crucial number that determines the difficulty of a 51% attack is the "hash-rate", the number of lottery tickets being "purchased" by expending computational power.  The numbers are quite staggering by comparison to a normal lottery.  Over the past year the hash rate has ranged between roughly 200 and 350 TH/s (trillion hashes per second).  Multiply that by 600 seconds (ten minutes) and you get between 120 and 210 quadrillion hashes per block.  Let's just round that off and call it 10^14.

The market price of bitcoin over the past year has ranged from about 16 to 45 kUSD, but that is neither here nor there because you can rent bitcoin mining equipment and pay in bitcoin.  Picking a random data point from https://www.miningrigrentals.com/ we can rent a rig with a claimed hash rate of 3.3 GH/s for 5.636683E-4 BTC/hr.  To get 100 TH/s, enough to mount a 51% attack even when the hash rate is at the upper end of last year's range, would cost (10^14 / 3.3 x 10^9)*5.636683E-4 BTC/hr = 17 BTC/hr = 2.8 BTC/block.  The block reward is currently 6.25 BTC/block, so this would not only be profitable, it would be wildly profitable.

Of course, there are obviously some limiting factors we have not taken into account here because if arbitraging bitcoin were this easy someone would have done it already.  The main limiting factor is that to carry out the attack you need to rent 30,000 of the 3.3 GH/s units that we used as our data point, and that number probably doesn't even exist, let alone available for rent.  Nonetheless, this analysis does demonstrate a crucial point: the thing that protects bitcoin from attack is not fundamental economics, because if 51% of the bitcoin network were available for rent at current market rates then a rental attack would be profitable.

Of course, the supply of bitcoin mining hardware is far from perfectly elastic.  Even under idealized assumptions, if someone were to try to rent 30,000 mining rigs, the price would surely rise to meet the dramatically increased demand, and (again under idealized assumptions) it should rise enough to eliminate the profit margin.

However, the block reward is not the only possible way to monetize such an attack.  A successful 51% attack, indeed even a credible threat of such an attack succeeding, would almost certainly sow fear and uncertainty in a wide range of public markets.  An attacker could leverage this because they would have a certain amount of control over when news of the attack broke, so they could (for example) take a short position on a portfolio of financial stocks before launching the attack.  From start to finish, the attack itself would take only a few hours, so the exposure to upside risk would be minimal.  This strategy is not a slam-dunk, but it seems to me like a potentially attractive business proposition with no more than the usual risks and caveats.  Notably, there would be nothing illegal about it (AFAICT, IANAL).

In private discussions I have heard three counter-arguments, none of which I accept (if I did I wouldn't be writing this) but I'll list them here along with my responses just for completeness.

The first is that there is not enough rental capacity to mount a 51% attack, and never will be.  The person who raised this argument didn't provide any data to back it up, but for the sake of argument I will stipulate that the first part is probably true.  However, just because there isn't enough rental capacity today is no guarantee that there won't be enough tomorrow, especially if an attacker starts to buy up the existing capacity and drives the price up to the point where renting is more profitable (on the selling side) than mining.

The second is that, even if this attack succeeds, the worst-case scenario is a chain split.  Bitcoin has had a chain split before (resulting in the creation of bitcoin cash) and survived, so it could survive another one.  The difference here is that the bitcoin-cash split was not caused by an attack, it was caused by a technical disagreement in the bitcoin community.  It was an amicable divorce executed under controlled circumstances.  A split resulting from an attack would have a very different dynamic and likely very different consequences.  In particular, if such an attack turned out to be profitable then that would provide a powerful incentive for it to be repeated.  Even if the first attack failed, someone might try it again using the lessons of the failed first attack to refine their strategy.  At best the result would be a great deal of uncertainty, which would likely result in reduced confidence, which is ultimately the stock in trade of any currency.

The third response is that the mining community would band together to thwart such attacks if one were ever to be mounted.  I am happy to stipulate that this very well might happen, but it is important to note what this implies for bitcoin's security: it means that bitcoin is ultimately not, as is often claimed, protected by mathematics or physics or even economics, but rather by the social cohesion, cooperation, and (dare I say it?) trustworthiness of the mining community.  In other words, at root, bitcoin is not fundamentally different from a TTP, it's just that the TTP is a self-selected group rather than an elected or appointed one.  (And, it is worth noting, you can't just decide to become part of this group, you have to literally buy your way in.  Bitcoin's governance structure is, by design, a plutocracy.)

I want to stress that my argument does not depend on whether a rental attack would succeed.  It suffices that it might succeed.  The strategy I've sketched above is (I claim) prima facie plausible.  There might be something that would prevent someone from actually pulling it off, but it is not immediately evident what that thing would be.  But whatever it is, that is the thing that is currently defending bitcoin against this attack, and that means that the thing that is defending bitcoin against this attack is not currently known.  And that should be deeply worrying to anyone taking a long position on bitcoin's future.

6. Discussion

As long as I'm tearing bitcoin apart I might as well go all the way and critique its other claimed benefits.  To review, those are:

  1. Non-inflatable
  2. Irreversible transactions
  3. Reduced fraud
  4. Lower transaction costs
  5. Practical small transactions

I'll address each of these in turn.

6. 1 Inflation

Bitcoin can be inflated through chain splits and also by policy.  Neither are likely any time soon (notwithstanding that one chain split has already occurred) but both are possible.  There is a strong ideological predisposition against inflation among current bitcoin enthusiasts but it is not clear that this will hold forever.  In particular, as the block reward tends towards a smaller and smaller share of the total market cap, political pressure towards inflation could mount, just as it tends to do with fiat currencies.  Also, if bitcoin ever achieves the goal that some of its adherents aspire to of making it the world's reserve currency, then the outsized holdings of early adopters will become harder to justify and the political pressure towards inflation will increase.  Satoshi Nakamoto, for example, is believed to hold about 1.1 million bitcoins, or just over 5% of the total market cap.  His keys have not been used in many years and are believed lost, but is any sane person really willing to bet the financial well-being of the planet on that?  Are future generations going to be willing to accept that decision made by their distant ancestors, or will they decide, as many before them, that a little inflation might actually be beneficial?

Bitcoin might be inflation-free at the moment, but only for the same reason that some fiat currencies are inflation-free: because the people who control them have decided as a matter of policy that inflation is undesirable.  The only thing that distinguishes bitcoin is that its policy-making is based on one-hash-one-vote.

6. 2 Irreversible transactions

This is probably bitcoin's strongest claim.  Reversing a bitcoin transaction is in fact impossible as a practical matter, and will be under all reasonable future scenarios.

However, irreversibility is very much a double-edged sword.  People make mistakes, or lose their keys, or have them stolen.  Under those circumstances the ability to reverse a transaction can be very desirable.  Of course that does open the Pandora's box of having to adjudicate disputes, which bitcoin mostly eliminates -- by eliminating the possibility of correcting mistakes and restoring stolen coins to their rightful owners by force.  This is not the place to engage in that policy debate.  I think you can probably guess which side I come down on.  I'll just point out that irreversibility is no panacea.  If it were, it would be universally adopted as the de facto standard.  There is a reason that no other irreversible monetary system has ever been widely adopted.  It's not because they are hard to build.

6.3 Reduced fraud

By adopting digital signatures to authenticate transactions bitcoin does eliminate one currently common kind of fraud.  But digital signatures can be adopted to eliminate that fraud without adopting the rest of bitcoin.  Indeed, this has been done throughout most of the world now with the introduction of chip cards to replace magstripes.  (The chips contain secret keys and produce digital signatures using them.)  The only arena where digital signatures are not yet widespread is on-line purchases.  There is no technical impediment to adopting them there, it's just a matter of agreeing on a standard protocol.  (I attempted to do this about ten years ago and failed, but that's another story.)

However, there is a dark side here as well.  Bitcoin eliminates one kind of fraud but replaces it with others.  In particular, if you lose your keys, or entrust them to a third party who decides to defect, then you have no recourse.  Furthermore, the irreversibility of transactions makes coercion more lucrative, leading to the rise of ransomware.  In fact, it is arguable that the rise of bitcoin was the catalyst that birthed ransomware as a global industry.  A thief can now steal your money with impunity from the comfort of their own living room.  It is no wonder so many people are choosing to make a career out of this, especially ones who live in places with lax enforcement.

6.4 Lower transaction costs

This is a theoretical possibility as long as bitcoin's value in terms of its purchasing power continues to rise.  But as soon as this stops, the value of the block reward asymptotically approaches zero, and the only way to fund mining after that (assuming the inflation policy does not change) is fees.  How this will shake out in terms of actual costs is anyone's guess because we are very far from reaching steady-state on that, but there are two things inherent in bitcoins design that will tend to drive fees up.  First, all that electricity that is used to keep the system secure has to be paid for somehow.  And second, the capacity of the network is limited by design.  It is technically possible to change this, but politically it is very difficult.  The last time someone tried the result was the bitcoin-cash chain split.

Even now, when the mempool of pending transactions is large, people sometimes have to pay quite exorbitant fees to get transactions mined in a timely manner (minutes instead of hours or days).  It is unrealistic to expect any commodity whose supply has a hard cap on it to be cheap.

6.5 Practical small transactions

This, I think, is Bitcoin's biggest broken promise, and again, it was foreseeable.  By design, bitcoin transactions take a long time to process, and the smaller the transaction, the less likely it is to be mined in a timely manner.  Furthermore, as noted above, the capacity of the system to process transactions has a hard limit on it which is woefully inadequate for handling the volume of small transactions that occur regularly throughout the world.  Using Bitcoin to buy a coffee at Starbucks was an intriguing novelty at one time, but it was never realistic for large numbers of non-technically-savvy people to use it for day-to-day retail transactions.

7. Conclusion

So does Bitcoin have any actual value?  I'm not sure.  It certainly is not suitable for its original stated purpose of replacing fiat currencies for day-to-day transactions.  This is evident in the fact that the value of Bitcoin is still measured in terms of how many US dollars it takes to buy one.

On the other hand, as I write this, that number stands at just about $30,000, which I find staggering.  Somewhere in my house I have an ancient laptop computer that has somewhere on its hard drive the keys to a wallet containing 0.05 bitcoins that someone gave me for free back in 2009 when bitcoin was first launched.  I noodled around with it for a while, and even tried mining for a few hours, but got tired of hearing the fan on my laptop screaming at me all the time.  So clearly I got something very badly wrong back then and it's entirely possible that I've got something very badly wrong now.  My track record of predicting the future is not great.

I think the main value of Bitcoin in the long run will be as a store of value, comparable to precious metals but easier to move around.  Allowing you to reliably store value without having to physically store and protect an artifact (other than a secret key) has real value, and that might well be enough to sustain bitcoin over the long run.  But if bitcoin offers anything of value as a medium of exchange, I don't see it.


Thanks to Ryan Orr, Joel Dietz, Nemo Semret, and Adam Wildavsky for interesting discussion and feedback on this article.

Saturday, March 04, 2023

Uncomputable things: Chaitin's constant, Busy Beavers, and Kolmogorov complexity

1. Introduction

The other day I was watching this Numberphile video about (among other things) uncomputable numbers when I came across this section around the 6:50 mark where Matt Parker talks about Chaitin's constant.  Strictly speaking, this is a whole family of constants, not a single number, but that doesn't really matter.  What matters is that Chatin's constants are rare examples of numbers which can be unambiguously defined but which are provably uncomputable.  In the video Matt kind of throws up his hands when trying to explain Chatin's constant(s) and why it/they are uncomputable, but it's really not that hard to understand.  Nonetheless, I haven't found a really accessible explanation anywhere so I thought I'd take a whack at it.

Chaitin's numbers (usually denoted omega or Ω) are named after Gregory Chaitin who discovered/invented them some time in the 1980s or 90s (I have not been able to find the original publication, or even a reference to it).  In a nutshell, Ω is defined as the probability that a random computer program will eventually halt or run forever.  Obviously, the value of this number depends on what you mean by "a random computer program", and that in turn depends on the kind of computer you are writing the program for, which is the reason that this is a family of numbers, not just a single number.  But Chaitin's omegas are nonetheless invariably referred to in the singular because the basic idea behind all of them is the same, and the details don't really matter all that much.

2.  The halting problem

I'll start with a short review of the famous Halting Problem, because that is the foundation on which  Ω is built.  The Halting Problem is: given a program P and an input I, does P eventually halt when run with I as its input, or does it run forever?

It turns out that we can prove this problem cannot be solved.  But before we prove it, note that one might be suspicious it can't be solved even without a proof, because if we could solve the halting problem, then we could leverage that result to answer any question about anything that we could render as a computer program.  Want to know whether the Goldbach conjecture is true?  Or the Riemann hypothesis?  Or even the infamous P=NP?  (If you want to get an idea of just how thorny a problem that is, read the introduction of this paper.)  All you would need to do is to write a computer program that systematically searches for counterexamples and then ask if that program halts or runs forever.  If it halts, then there exists a counterexample and the conjecture is false, otherwise it's true.  A solution to the halting problem would put mathematicians and physicists permanently out of business.

Fortunately for both, you can prove the halting problem can't be solved.  This was first done by Alan Turing in 1936. Turing's original paper is challenging to read because there were no computers back then, so he had to pretty much invent the entire notion of "computer program" in a world where programmable computers did not yet exist.  (Back then a "computer" was a human being who did calculations by hand.  The word referred to a profession, not a device.)  But nowadays computer programs are ubiquitous, and we are used to thinking about bits and bytes and whatnot, and that makes the proof a lot easier.

Here is an outline of what we are going to do.  First, we are going to assume that we can solve the halting problem, that is, we're going to assume that we can write a program, which we will call H, which takes as input another program P and returns TRUE if and only if P halts, otherwise it will return FALSE.  Second, we are going to make a second program, which we will call B, which, like H, is going to take a program as input.  But instead of returning TRUE or FALSE, it is going to call H as a subroutine to determine whether the program P that has been given to it halts or not, and then B is going to do the opposite of what H says P will do, i.e. if H says that P halts, then B is going to enter an infinite loop (and hence run forever).  Finally, we are going to run B with a copy of itself as its input, and show that this leads to a contradiction, and hence that our assumption that we could write H must be false.

There are really only two details we need to fill in to turn that outline into a fully fledged proof.  The first is that we need to explain what we mean to run a program with another program as its input.  Turing's original paper spent many pages on this, but today we can simply point to the familiar fact that programs are just strings of bits, and inputs to programs are just strings of bits, and so running a program with another program as its input is no different than running a program with any other kind of input, it just so happens that the string of bits we supply as input happens to correspond to a valid program, which we will just stipulate.  (Or, if you want to be a stickler about it, we can just stipulate that invalid programs halt by producing an error.)

The second detail is a bit thornier.  As we have described it, H is a program that takes one input, a program P, and likewise B is a program that takes one input, which is also a program.  But what about P?  How many inputs does it take?  We have played a little fast-and-loose with this.  Remember, our description of H was that it "takes as input another program P and returns TRUE if and only if P halts" but we said nothing about the input to P.  Does P even take an input?  If so, where does its input come from when H tries to figure out whether or not it is going to halt?

There are several different ways to address this, but the easiest is to change the definition of H so that it takes two inputs, a program P and an input I, and returns TRUE if and only iff P halts when run on input I, and restrict ourselves to only giving H programs that take one input.

I am also going to introduce a bit of notation here: if P is a program and I is an input, then P(I) is the result of running program P on input I.  In the case where a program takes more than one input, like our redefined H, we separate them with commas, e.g. H(P, I).  So H(P, I) is TRUE if and only if P halts when run on input I, that is, if P(I) halts, otherwise it is FALSE.  (Exercise: what is H(H, H)?)

Now we can define our magic program B.  B is going to take one input, a program P, and it is going to call H as a subroutine with P as both the program to be analyzed and the input to that program.  In other words, B(P) is going to start by computing H(P, P).  If the result is TRUE (i.e. if P halts when run on a copy of itself as input) then B is going to enter an infinite loop, otherwise it will halt.

In other words, we will build B so that B(P) will halt if and only if H(P, P) is false, that is, if P(P) runs forever.  Otherwise, if H(P, P) is true, i.e. P(P) halts, B will run forever.

Now, what happens if we run B with a copy of itself as input, i.e. what happens if we try to compute B(B)?  Well, B(B) is going to start by computing H(B, B), which is a special case of H(P, I).  Recall that H(P, I) is true if and only if P(I) halts.  So H(B, B) is true if and only if B(B) halts.  But B(B) halts if and only if H(B, B) is false.  This is a contradiction, so our assumption that it was possible to write H must be false.  QED.

3.  Busy Beavers

So now that we know that the halting problem cannot be solved, we can also know that any information that would allow us to solve the halting problem must be impossible to obtain.  As an example of the kind of information that might allow this, consider the so-called busy-beaver numbers, denoted BB(n), which are defined as the largest number of steps that a computer program of length n could possibly run before halting.  For any n, BB(n) must exist, and it must be a finite integer.  Why?  Because there are only a finite number of programs of length n (in fact, at most 2^n of them), and so there are only a finite number of programs of length n that halt, and so one of them must be the one that runs the longest before halting.

And yet, if we knew the value of BB(n) then we could solve the halting problem for programs of length n.  How?  Simply by running all of the programs of length n in parallel until we ran them all for BB(n) steps!  Any program that runs longer than that must run forever.

So the BB function must be uncomputable.

4.  Chaitin's constant(s)

Another example of information that would allow us to solve the halting problem is the number of programs of length n that halt.  This number doesn't have a common name so I'm going to call them the C numbers, i.e. C(n) is the number of programs of length n that halt.  Again, C(n) is a perfectly well-defined number.  Indeed, C(n) must be an integer between 0 and 2^n, so they are not even a mind-bogglingly big number like the BB numbers are.  And yet, if we could compute C(n) we could solve the halting problem, and so C(n) must not be computable.

Note that C(n) is not a number, it's a (uncomputable) function just like BB is.   Chaitin's constant is a number that is constructed (more or less) by taking all of the C(n)'s, concatenating them together, and interpreting the result as the decimal expansion of a real number.  (And both depend on the details of the computing model, so really they are a family of functions or a family of numbers, but that is not what matters.)

If you look up Chaitin's constant you will find it is defined in terms of probabilities, specifically, the probability that something called a "prefix-free universal Turing machine" will halt on a random program, but all of this is just pedantry.  A "prefix-free Turing machine" is just a way of defining a computational model that allows you to formalize the notion of a "random program of length n", and the probability that such a program will halt is just C(n)/2^n.  Then there's some additional fancy-looking math to pack all of these rational numbers into a single real number in such a way that you can extract the C(n)'s with some more fancy math.

But all of the fancy math obscures the fact that at the end of the day, Chaitin's constant is just a numerical representation of the sequence of C(n)'s concatenated together.  In fact, if you think in binary, it is literally this.  In binary, when you divide an integer by 2^n, all you are doing is shifting that integer to the right by n digits.  Because every C(n) is at most n binary digits long, then shifting each one by n bits twice makes them all line up in a way that they don't overlap.  Then you can just fram them all together, put a decimal point on the left, and bam, you have a number which, if you knew its value, you could reconstruct the sequence of C(n)'s and hence solve the halting problem.

So all of these uncomputable things -- the busy beaver numbers, the C(n) sequence, and Chaitin's constant, are all just ways of "repackaging" the uncomputability of the halting problem.

5.  Kolmogorov complexity

Are there uncomputable numbers that can be defined without reference to the halting problem?  Yes.   Consider a computer program that produces some output, and ask: what is the length of the shortest program that produces the same output?  This question was first asked by Andrey Kolmogorov, and so the shortest program that produces a given output is called the "Kolmogorov complexity" of that output, which I will abbreviate KC.  So KC(n) is a function whose value is the length of the shortest computer program that will produce n as its output.

The proof that KC is uncomputable is also due to Gregory Chaitin, and it looks a lot like the proof that the halting problem is uncomputable.  We're going to assume that KC is computable and show how this leads to a contradiction.

So let us choose a program P that produces an output n, and assume that we can compute KC(n).  Obviously we know how long P is, an so we can tell whether or not its length is equal to KC(n).  If it is, i.e. if P is (one of) the shortest program(s) whose output is n, we will call P an elegant program.  (This is Chaitin's term.  A more neutral term would be "minimal" but I'm going to defer to the master.)

So if we can compute KC, then we can write an elegance tester, i.e. a program E which takes as input a program P and returns TRUE if that program is elegant, i.e. if its length is the same as the KC of its output.  It turns out that E is impossible in the same way that H turns out to be impossible.  To see this, we construct a new program B which works as follows: B is going to take as input some integer I, and start enumerating all programs longer than I and passing those programs to E to see if any of them are elegant.  When it finds an elegant program, it is going to run that program.

Note that B has to produce some output.  Why?  Because there are an infinite number of elegant programs, at least one for each possible output n.  And so sooner or later, B has to find one and produce the same output that it does.

Now let's run B with I set to the length of B plus one.  (Strictly speaking we have to set it a little longer than that, to the sum of the length of B plus the log base 2 of I, but that's a detail you can safely ignore.)  This means that sooner or later, B will find a program P that E says is elegant, and it will run P, and hence produce the same output n as P.  But because B only tests programs longer than B, P must be longer than B, and so P cannot be elegant because B, which is necessarily shorter, produced the same output.  (Again, strictly speaking, it's the length of B plus I that matters, but again, this is a detail.)

So again we have a contradiction, and so KC cannot be computable.

Note that this uncomputability stems from a fundamentally different source than Chatin's constant, which is really just a corollary to the halting problem.  KC has to do with optimization rather than halting, and it has some really profound practical implications.  For example, you can think of mathematical models of physics as computer programs.  The uncomputability of KC implies that we can never know if we have the optimal theory of physics.  Even if some day we manage to unify general relativity and quantum mechanics and create a theory that accounts for all observations, we cannot possibly know if we have found the simplest such theory.  That is fundamentally unknowable.


Saturday, January 28, 2023

Lisping at JPL Revisited

It has been over 20 years since I wrote "Lisping at JPL", half memoir and half geeky screed (and all of it half-baked) about how an obscure programming language came to define my career.  It was a borderline throwaway piece written as much to vent my spleen as to inform.  In the intervening two decades it has gotten a lot more attention than I ever expected, becoming a perennial favorite on Hacker News, launching my (very short) career as a podcast guest, and even inspiring some people to plagiarize it, which make me feel simultaneously flattered and annoyed.

User marai2 over on HN made this suggestion the last time L@J was posted on HN:

Since this essay cycles here on HN perennially and inevitably people are curious about how your views about lisp, or other languages have changed, you might want to have a follow up page that you forever keep on updating as your views change?

So, by request...

My views on Lisp have not changed: Common Lisp is still far and away my favorite programming language, but that is mainly because I don't really program in Common Lisp any more. I program in a little custom language that I have incrementally built on top of Common Lisp over the years. This language consists mainly of a single macro called BINDING-BLOCK which is a stand-alone replacement for almost all of the other binding constructs in Common Lisp and makes my code look very different from most CL code. In particular, my code has a lot fewer parentheses, and it doesn't run off the right side of the screen as much.  (Here is an example.)  A CL programmer looking at my code would probably go, "WAT?" And when I look at regular CL code nowadays I go "blech!" Especially if that code uses LOOP. Then I go double-blech, despite the ironic fact that LOOP and BINDING-BLOCK have a lot in common.

All this is a reflection of the so-called Lisp curse, the fundamental problem with Lisp -- its great strength is simultaneously its great weakness.  It is super-simple to customize Lisp to suit your personal tastes, and so everyone does, and so you end up with a fragmented ecosystem of little sub-languages, not all of which (to put it mildly) are particularly well designed.

But *I* am 10x more productive in CL than in any other language. And not just CL, but one particular CL environment, Clozure Common Lisp, which I have been using since it was Coral Common Lisp running on a Macintosh Plus with one 800k floppy disk and one megabyte of RAM. CCL is probably the most under-appreciated engineering marvel ever produced by any species other than termites. I have been using it for 38 years. (Unfortunately, that era is likely drawing to a close.  Clozure Associates is no longer a going concern, and the volunteer effort to port CCL to the M1 appears to have stalled. This makes me very sad. It is truly the end of an era, and a very long one by technological standards.)

People often ask me what I think about other languages, and Clojure in particular. My answer is that I have never used Clojure, but from what I have read it looks pretty neat. But I think no other language can ever possibly be better than Common Lisp for me because I have customized the hell out of my environment until it is exactly the way I want it. It is quite literally the perfect language for me because I have made it that way -- because I was able to make it that way. Any time I find something about it I don't like, I can change it quickly and easily. Nothing that someone else designs can ever compete with that. Even I can't compete with that. Every now and then I toy with the idea of building my own Lisp from scratch, and I once took a semi-serious whack at it as an exercise to try to learn C++, but I gave up when it became clear that the effort required to make it competitive with CCL plus 38 years of incremental tweaking was vastly more than I was willing to invest.

There is one feature of Common Lisp that I absolutely love which no other language offers AFAIK, and that is a generic-function model of objects.   IMHO, the goal of a programming language ought to be to make writing code as easy and free of cognitive load as possible.  If I want to do an operation, I want to be able to say (op arg1 arg2 ... argn) without having to think about whether OP is a plain function or a method or an operator.  In every other language I have to think about whether to write op(arg1, arg2 ... argn) or arg1.op(arg2, ... argn) or "arg1 op arg2" or, in Objective C, [arg1 op:arg2 someRandomThing:arg3 ... someOtherRandomThing:argn].  In Lisp I never have think about that.  I just write (op arg1 .. argn) and it does the right thing.

So the bottom line is: I still love Common Lisp.  YMMV.

My second favorite language at this point is Python.  It has a lot of nifty features, and a clean design that is mostly free of gotchas.   Ironically, I don't like Python's most iconic feature, the syntactically significant white space.  Python claims to not use braces, but this is half a lie.  It does use an open brace, it just uses the colon character to serve that purpose.  So in my code I use the "pass" statement as a de-facto close-brace, so when I use emacs python-mode, my code always auto-indents correctly.

But the best programming language is like the best wine.  Different people like different things, and that's OK.  Drink whatever is the best fit for your palate, and code in whatever is the best fit for your brain.

Wednesday, January 25, 2023

An intuitive counterexample to the Axiom of Choice: followup

 What was intended to be a short throwaway blog post about someone else's nifty idea regarding the Axiom of Choice (AoC) ended up getting a lot more attention and being much more widely misunderstood than I ever expected, so I decided to follow up with a more detailed explanation.

Note that I deliberately chose the word "detailed" and not "rigorous".  Although I am going to try to be a little more rigorous as well, this is emphatically not a "proof" of anything, and particularly not a proof that the AoC is "wrong".  Such a proof is not possible.  The AoC is, as its name implies, an axiom.  It is a free choice to do mathematics with it or without it, and either choice leads to some weird results.  This is simply an informal argument that the AoC is not self-evidently true.  It is analogous to someone before Gauss saying about Euclid's fifth postulate, "Suppose we tried to do geometry on the surface of a sphere..."

The AoC sounds innocuous enough when you first hear it: given a collection of non-empty sets, it is possible to construct a new set whose members consist of one member chosen from each of the given sets.  How could this not be true?  Note that this question is analogous to: Euclid's fifth postulate says that given a line an a point not on that line, you can construct a unique line parallel to the given line.  How could this not be true?  And the answer is: it can be non-true if you are doing geometry on a curved surface.

The answer in the case of the AoC is something like: it can be not-true if the sets you are dealing with are really weird.  My post was an attempt to give a specific example of that kind of weirdness.  I ended up tripping over playing too fast-and-loose with the notion of "describable".  So here I am going to take another whack at it, trying to be a little less sloppy.  Note that less-sloppy does not mean rigorous.  Like I said, this is not an attempt to prove anything, notwithstanding that parts of what I am about to say might sound a bit like a proof.

So with that in mind, let us start with a few elementary observations.

1.  There are at most a countably infinite number of finite-length unicode strings.

2.  Some unicode strings are unambiguous descriptions of numbers under certain uncontroversial background assumptions.  Examples include "123", "pi", "e", "the successor of zero", and "the golden ratio".  Other unicode strings are unambiguously not unambiguous descriptions of numbers.  Examples include "hamburger", "irvnwoihdphweg" and "😱".  And still other unicode strings may or may not be unambiguous descriptions of numbers.  An example of this is "the smallest counter-example to the Goldbach conjecture."  That might be an unambiguous description of a number, or it might not, depending on whether the Goldbach conjecture is true.

A lot of the discussion generated by my original post turned on this last point, that it is very hard, arguably impossible, to pin down what an "unambiguous description of a number" actually means.  The meaning of "unambiguous description of a number" is ambiguous.  But this will turn out not to matter.

Here are a few more observations that will turn out not to matter.  I include them just to short-circuit some of the discussion.

3.  All computable numbers have unambiguous descriptions, to wit, the algorithms that compute them.

Note that unambiguous descriptions need not be unique.  "One", "1" and "the successor of zero" are all unambiguous descriptions of the same number.  Two different algorithms that compute the same (computable) number are each an unambiguous description of that number.

It is tempting to conclude that the number of unambiguously describable numbers must be strictly less than the number of computable numbers, except that...

4.  Some uncomputable numbers have unambiguous descriptions.  Chaitin's constant is the canonical example.  It is not known (and probably unknowable) how many uncomputable numbers have unambiguous descriptions.

So now we get to the thorniest bit.  Some commenters claimed that:

5.  It is not possible to give an unambiguous description of what an "unambiguous description of a number" means, because any such putative description is subject to Cantor's famous diagonalization argument.  In this case the argument goes like this: suppose we had an unambiguous description of what an unambiguous description means.  Then we could make an (infinite) list of all the numbers with unambiguous descriptions, and then construct a new number with an unambiguous description that is not on that list.

This is true, but I claim it is irrelevant for two reasons.  First and foremost, I don't need an unambiguous description of unambiguous descriptions for my argument to hold.  All I need is to show that there exist numbers that do not have any unambiguous description, and for that all I need to show is that the number of numbers with unambiguous descriptions is (at most) countable, and all I need for that is to stipulate that an unambiguous description must be renderable as a finite-length unicode string.  The number of finite-length unicode strings is countable, and so the number of unambiguous descriptions must be countable, and so the number of numbers with unambiguous descriptions must be countable.  But since the reals are uncountable, there must remain an uncountable number of reals with no unambiguous description, at least not one that is renderable as a finite-length unicode string.

So I don't need an unambiguous description of unambiguous descriptions.  All I need is an oracle for it, and any oracle will do.

That is sufficient reason that diagonalization doesn't matter, but since this is an informal argument I'm going to point out something else: Cantor originally advanced diagonalization to prove that the reals are uncountable.  Far from invalidating my argument, I am actually relying on that result!  The whole point is that it doesn't matter what an unambiguous definition actually is, or that we can't necessarily tell whether a given unicode string is one or not (c.f. "the smallest counter-example to Goldbach's conjecture").  We have some strings that are (by social convention) unambiguously unambiguous descriptions (which I am now going to start abbreviating UDs), and others that are unambiguously not UDs, and still others that might be or might not.  None of this matters.  Take any description of what it means to be a UD and I will happily stipulate that the Nth diagonalization of the list of numbers generated by that specification is also a number with a UD, and so the original specification must have been incomplete or even inconsistent.  You can only diagonalize a countable number of times, which is the whole point of the diagonalization argument.  If you think that diagonalization invalidates my position, then you either don't understand diagonalization, or you don't understand the argument I am advancing.

None of the above matters, because the conclusion I am trying to defend has nothing to do with the details of what it means to be UD.  My claim, which I would have thought would be obvious and wholly uncontroversial, is that there must exist an uncountable number of numbers that unambiguously do NOT have unambiguous descriptions under ANY reasonable definition of an unambiguous description.  And the argument for that is simply that the reals are uncountable, but any reasonable definition of unambiguous description will yield only a countable number of them, and hence only a countable number of numbers they can possibly describe.

I could even go meta and point out that a reasonable condition on reasonable definitions of unambiguous descriptions (or anything else for that matter) is that they be renderable as a finite string of unicode characters, and so there can be at most a countable number of "reasonable" definitions of UD.  No matter how you slice it, you are still left with an uncountable number of numbers that unambiguously lack any unambiguous description.

Henceforth I will adopt the abbreviation NULUD for Number Unambiguously Lacking an Unambiguous Description.  So there are an uncountable number of NULUDs.  The NULUDs are a proper subset of the set of uncomputable numbers.

OK, but so what?  So there are an uncountable number of NULUDs.  How does that cast doubt on the validity of the axiom of choice?

Well, imagine that I gave you two NULUDs and asked you if they were equal.  Think about that for a moment.  For starters, what would it even mean for me to "give" you two such numbers?  I can never actually exhibit a NULUD, or even describe one.  I can't write one down.  I can't given you an algorithm to compute one.  There is no way for me to communicate any NULUD to you by finite means.  For me to even contemplate "giving" you two NULUDs I have to do something like postulate some sort of oracle for them which offers finite clues about their identities, something like spitting out successive digits of their decimal expansions.  But then what?  You could start comparing the outputs of the oracles, and if the numbers are not equal you will eventually find out.  But if they are equal you will never know!  Comparing NULUDs for equality is analogous to trying to solve the halting problem by running the program and waiting to see if it halts.  So not only do you need to postulate oracles for the numbers themselves, you have to postulate an oracle for their equality!  That's an awful lot of postulating.

In fact, an oracle for equality of two NULUDs is a mighty powerful oracle.  Such an oracle is even more powerful than an oracle for the halting problem, which only has to deal with computable numbers (of which there are only countably many).

What does any of this have to do with the AoC?  Well, if you are going to "choose" a number from a set you need to be able to determine whether the number you chose is a member of the set, i.e. whether or not it is equal to a member of the set.  To do that for a set of NULUDs you need an oracle for equality of NULUDs, which means you need an oracle that is more powerful than an oracle for the halting problem.

So you see, asking someone to accept the AoC for all sets is asking an awful lot.  The AoC hides a lot of power.

But let us grant ourselves all of this extraordinary oracular power.  I claim that you will still not be able to describe a procedure by which you can choose a member of an arbitrary set of NULUDs.  Again, think about it.  You can't just treat the set like a bag where you can just reach in and grab some of its contents.  The contents of a set of NULUDs is a bunch of things that can't be described, and so the set itself cannot be described.  The best you can do, given what we have allowed ourselves, is to somehow pick an arbitrary NULUD and invoke the equality oracle on every member of the set to see if you got lucky.  That process is not guaranteed to converge.

Now, a mathematician (as opposed to a computer scientist like me) would read the above talk of oracles and procedures and say that this is all rubbish, that all I need to do to "give" you two NULUDs is to say, "Let x and y be two (un)equal NULUDs."  And what we need for the AoC  is not a procedure but a function, specifically a choice function.  And here, they will say, there is no problem because there are many functions that can be defined perfectly well on NULUDs.  f(x) = x+1 for example.

I don't deny this.  But defining a function on NULUDs is not the same thing as defining a choice function.  A choice function is not a function on NULUDs, it is a function on arbitrary sets of NULUDs whose value is a NULUD that is guaranteed to be a member of the set.  That is a very tall order.  I challenge anyone to come up with a description of such a function, even an informal one, that does not invoke the AoC.  (That would, of course, be cheating.)  And if no one can come up with a description of such a function, a reasonable person could hypothesize that this is because such a function does not exist.  Indeed, the AoC exists because the only way we can produce such a function in some circumstances is by postulating it.  If we could describe a choice function for all circumstances, we would not need the AoC.  Of course, I cannot prove that NULUDs are such a circumstance, but it seems plausible to me.  I can't even begin to imagine how you would do it.  (But maybe some mathematician out there can enlighten me.)

Of course, none of this disproves the AoC.  Like I said at the beginning, the AoC is not false (at least not under standard set theory).  My point here is just to describe a set of circumstances under which it is not self-evidently true.

Monday, January 23, 2023

An intutive counterexample to the axiom of choice

Time for some hard-core geeking-out.

This comment on HN by /u/jiggawatts struck me as a brilliant idea: it's an intuitive counter-example to the axiom of choice, which seem intuitively obvious, but leads to weird results like the Banach-Tarski paradox.

For those of you who are not hard-core geeks, the axiom of choice says (more or less) that if you are given a collection of non-empty sets, you can choose a member from each of those sets.  That seems eminently plausible.  How could it possibly not be true?

Here's how: consider the set of numbers that cannot be described using any finite collection of symbols.  Such numbers must exist because there are only a countably infinite number of numbers that can be described using a finite collection of symbols, but there are an uncountably infinite number of real numbers.  So not only are there numbers that cannot be described using a finite number of symbols, there are vastly more of these than numbers that can be so described.

And yet... how would you describe such a number?  By definition it is not possible!  And so it is not at all clear (at least not to me) what it would even mean to "choose" a number from this set.

This is, of course, not a proof that the axiom of choice is wrong.  It's an axiom.  It can't be wrong.  But it is a good example for casting doubt in its intuitive plausibility, and that feels like progress to me.

Thursday, January 12, 2023

I am offended by Muslims being offended

An adjunct professor at Hamline University  was fired for showing an historically significant painting of the prophet Mohamed, peace be upon him.  (Note: I did not add PBUH to be facetious.  PBUH is an honorific, like Ph.D.  I added it to show respect to Muslims who believe that Mohamed was in fact a prophet, notwithstanding that I believe they are wrong.)

This is the painting in question:


From the NYT article:

The painting shown in Dr. López Prater’s class is in one of the earliest Islamic illustrated histories of the world, “A Compendium of Chronicles,” written during the 14th century by Rashid-al-Din (1247-1318).

Shown regularly in art history classes, the painting shows a winged and crowned Angel Gabriel pointing at the Prophet Muhammad and delivering to him the first Quranic revelation.

Much ink has already been spilled debating the merits of Hamline's actions and I don't have much to add, but there is one thing that I haven't seen anyone point out yet: this is not a painting of Mohamed.  This painting was made centuries after Mohamed died.  The artist had no idea what Mohamed actually looked like.  No human who was not alive in Mohamed's time knows what he looked like because there is no record of what he actually looked like.  The image above is either a likeness of a contemporary (to the artist) model, or a product of the artist's imagination.  This is an image that the artist imagined to be a likeness of Mohamed, but actually isn't.

So Muslims who are offended by exhibiting this painting are not offended by the exhibition of a likeness of Mohamed, because there are no likenesses of Mohamed.  Muslims being offended by this image are being offended not by the image, but by the (false) claim that this image is a likeness of Mohamed.  They are offended not by the image but by the imagination of a long-dead artist, one who, ironically but not insignificantly, was himself a Muslim.

If you think that is a reasonable thing to pander to, well, then I am offended by you.  Again, I am not saying that to be facetious.  I am sincerely offended by your belief that I have an obligation not to offend you.  And now we have a problem: should my offense be sufficient cause to make you lose your job because you want me to lose my job for having offended you?  Because if your answer is yes, then we are headed for 100% unemployment.  No one will be able to work anywhere if a necessary condition for keeping your job is to not say or do anything that offends anyone.

On the other hand, if you think your offense is somehow privileged, that you should get to keep your job despite having offended me while I have to lose mine for having offended you, now we have a different problem: you now have to provide some justification for what makes your offense so much more worthy of accommodation than mine.  And you cannot ground your justification in religion because a belief in freedom of expression is part and parcel of my most deeply held beliefs.  The only way you will be able to justify your privileged position is to argue that your deeply held beliefs should trump mine for some reason.

(Aside: freedom of expression definitely does not mean that a private entity cannot exercise editorial control over what appears in venues it controls.)

There are only three alternatives here: either Muslims are elevated to a privileged position where their offense trumps everyone else's, or they have to be told by society to suck it up, or we enter a death spiral of social paralysis where no one can do or say anything out of fear of offending someone.

Figuring out which alternative I would advocate is left as an exercise for the reader.