(This is part 4 of a series.)
This post will be a geeky digression from the main narrative, because jonny said he wanted to hear more about elliptic curves.
Up-front disclaimer: I am not an expert on elliptic curves, and I wrote this post in some haste. It's possible I made some mistakes in some of the equations. Don't rely on what you read here for anything mission-critical.
The key (no pun intended) to cryptography is finding hard mathematical problems. A "hard problem" is a term of art: it means a problem for which there is no known solution that is significantly less work than brute force. The hard problem that RSA is based on is finding the factorization of the product of two large primes. The hard problem that elliptic curve cryptography is based on is something called the discrete logarithm over point multiplication. The purpose of this post is to explain what that means.
We'll break this down into two parts. First, discrete logarithms. Recall that an ordinary logarithm is the inverse of exponentiation. If X raised to the power of Y is Z, then Y is the logarithm of Z base X. Notice that the form of this definition is the same as the form of more familiar concepts like difference and quotient.
If X plus Y is Z, then Y is the difference of Z and X.
If X times Y is Z, then Y is the quotient of Z and X.
If X raised to the power of Y is Z then Y is the logarithm of Z base X.
A discrete logarithm is just like a regular logarithm, except that all the mathematical operations are done modulo some integer Q, which is typically a very large prime number. Every time you add or multiply, you divide the result by Q and take the remainder. (Another way to think of this is that all intermediate results are stored in a register than rolls over at Q, and overflows are ignored.) A modulo B (often abbreviated A mod B) is the remainder when A is divided by B.
For the sake of illustration, let's pick Q=13. (In practice Q would have hundreds of digits.) And let's pick two numbers out of a hat, say, 7 and 9. The sum of 7 and 9 is 16, so the sum of 7 and 9 mod 13 is 3. The product of 7 and 9 is 63, so the product of 7 and 9 mod 13 is 11 (because the remainder when you divide 63 by 13 is 11). 7 raised to the 9th power is 40353607, which leaves a remainder of 8 when divided by 13, so the discrete logarithm of 8 base 7 (mod 13) is 9.
A discrete logarithm over an elliptic curve is just like an ordinary discrete logarithm, except that regular numerical addition is replaced with a mathematical operation called point addition over an elliptic curve. An elliptic curve is just a regular old curve of the sort you learned to graph in high school algebra class. The equation of an elliptic curve looks like this:
y^2 = x^3 + A*x + B
where A and B are constants. The shape of this graph is not particularly important, but if you're interested you can find examples here.
Elliptic curves have the useful property that if you draw a line between any two points on the curve, that line is guaranteed to intersect the curve at exactly one additional point. (That's not quite true, but a good enough approximation for now. I'll fill in the actual details later.) Using this fact, we can define an operation called point addition: to add two points, draw a straight line between them. The sum is the third place where this line intersects the curve. (Actually, the sum is this point with the Y coordinate multiplied by -1, but you can ignore this for now. It makes an interesting exercise to figure out why the definition of point addition has this peculiar feature.)
Of course, you don't have to do this operation geometrically. You can do it mathematically as well. And if you do it mathematically, you can replace all the mathematical operations on real numbers with discrete operations on integers modulo Q, and all the results remain the same.
For the rest of this discussion, all arithmetic is done with integers modulo Q.
I've glossed over two details that you can safely ignore for now, but I'll describe them here just for completeness, because if you go read the literature on elliptic curves you will encounter these terms.
First, if you draw a line between two points on the curve with the same X coordinate, then this line does NOT intersect the curve. To fix this, we add a conceptual "point at infinity" and define the sum of two points with the same X coordinate to be this point. (To any mathematicians reading this: yes, I know this is not how the point at infinity is actually defined. I'm trying to keep this simple.)
Second, if you try to add a point to itself you don't have two discrete points to define a line. In this case we draw the line tangent to the curve at this one point, and find the second point where this tangent line intersects the curve. The math is a little different in this case, so adding a point to itself is called point doubling instead of point addition.
If you're having a hard time visualizing all this from my description here are some pretty pictures.
So now we have this abstract mathematical "point addition" operation, and it turns out that this operation has the same algebraic properties as regular old addition: it is commutative (P1+P2 = P2+P1) and associative ((P1+P2)+P3 = P1+(P2+P3)). It has an identity element (which turns out to be the "point at infinity"). So we can use this operation as if it were addition and define point multiplication of a point P by a number N in the usual way, as P added to itself N times.
We have to stop there. We can't go on to define "point exponentiation" because we have a "type mismatch" error. Exponentiation is normally defined as a number multiplied by itself N times. But we can't multiply a point by itself, we can only multiply a point by a number. But that turns out to be good enough, because the inverse of point multiplication turns out to be a Very Hard Problem. Mathematicians call it a discrete logarithm even though it would more properly be called a discrete quotient.
There is a clever trick that allows the product of a point P and a (large) number N to be computed very efficiently. It is in fact the exact same trick that allows you to multiply two ordinary numbers efficiently. But there is no known way to solve the inverse problem efficiently if you're using modular arithmetic. This is what makes point multiplication a useful building block for cryptography. (Note: this is true for modular exponentiation of ordinary numbers as well, but not for modular multiplication of ordinary numbers. There are ways to compute modular multiplicative inverses efficiently.)
As an example of how we can use point multiplication, consider the problem of key exchange. You want to use an insecure communications channel (like the Internet) to send secret data to Bob. You can encrypt the data, but then you have to somehow get the encryption key to Bob so he can decrypt the data. You can do this with a technique called Diffie-Hellman key exchange (after the two people who invented it). You and Bob each pick a large random number. Say you pick N1 and Bob picks N2. You also agree on an elliptic curve and a starting point P (called the "base point"). You compute P*N1 and send the result to Bob. Bob computes P*N2 and sends the result to you. Now you compute (P*N2)*N1 and Bob computes (P*N1)*N2. Because point multiplication is commutative and associative, the results will be the same. But an eavesdropper who only knows P*N1 and P*N2 cannot compute P*N1*N2 unless he knows N1 or N2, and to find either of those numbers he has to solve the discrete logarithm problem. In this scenario, N1 and N2 are the secret keys, and P*N1 and P*N2 are the public keys. So P*N1*N2 is a shared secret that you and Bob and use to encrypt data to send to each other.
Using elliptic curves to produce digital signatures is a little more involved, but here's the intuition behind it: suppose you do a Diffie-Hellman key exchange with Bob. You send Bob P*N1. But that doesn't prove that you know N1. You might have obtained P*N1 by eavesdropping on someone else. How can you prove to Bob that you actually know N1? One way is for Bob to encrypt a message using the (supposedly) shared key P*N1*N2 and ask you to decrypt it and report the result. If you can decrypt it, that proves you know N1, because that's the only way you could have computed the decryption key P*N1*N2.
That proves you know N1 to Bob, but what if you want to be able to prove you know N1 to anybody? One possibility is to do Bob's side of the protocol for him: pick a random message M and a random private key N2 and publish the following:
1. M
2. N2
3. P*N1 (your public key)
4. M encrypted with the "shared" key P*N1*N2
Because you have published P*N1 and N2, anyone can now compute P*N1*N2. So anyone can decrypt the encrypted message and verify that it matches M, which proves that you must have known N1.
That's a good start, but this version of the protocol has a fatal (and rather obvious) flaw. See if you can find it before reading on.
The problem is that there is no way to know that the public key you published is in fact the result of *you* computing P*N1. You could have obtained P*N1 by eavesdropping on someone else's Diffie-Hellman key exchange. You could even have just picked a random number. The only reason that the Diffie-Hellman protocol works is that both N1 and N2 are secrets held by two different people. Diffie-Hellman only allows you to prove your knowledge of N1 to someone who knows N2 and who knows that you do not know N2. So if you pick N2 the protocol collapses.
How do we fix this? We have to roll N1 into the computation somehow in a way that the result has the following properties:
1. The result could only have been computed by someone who knows N1
2. The result is verifiable by anyone
3. The result does not reveal N1
We do this by choosing a random "message" (let's called it M) and "encrypting" it in two different ways that can be combined in a way that only works if the encryption was done with knowledge of N1. I put "encrypting" in scare quotes because at this point we're not really encrypting, just slicing-and-dicing in a way that can't be easily undone.
I'm going to change notation a bit to make what comes next a little easier to follow. What I have been calling N1 to now I am now going to start calling SK (for Secret Key). What I have been calling P, I am now going to start calling BP (for Base Point, a constant, publicly known common starting point on whatever elliptic curve we are using).
Recall that our strategy is to slice-and-dice a random number M in two ways so that the results can be combined in a way that works only if they were produced with knowledge of SK. First we compute our public key, PK, just as we did with Diffie-Hellman key exchange, but using our new notation:
PK = BP*SK (formerly known as P*N1)
Note that PK is a point on the curve, not a number.
Next we compute:
R = XC(BP*M)
S = SK*R/M
where XC(P) means the X coordinate of a point P.
The signature is the pair (R,S). (Note that "dividing by M" really means multiplying by the multiplicative inverse of M (mod Q).)
To verify this signature, compute XC(PK*R/S). If the signature is valid (i.e. was computed by someone who knows SK) this will be equal to R. Verifying this is elementary algebra:
PK*R/S = (BP*SK)*R/S = BP*(SK*R)/(SK*R/M) = BP*M
So XC(PK*R/S) = XC(BP*M), which by definition is R. QED.
This protocol still has a problem, but it's much more subtle than the problem with the first version we came up with. For this protocol to work, M has to be kept secret, because if we know M we can trivially recover SK. (Figuring out how is left as an (easy) exercise.) So we can prove we know SK, but we can't yet bind that proof to a message (because we have to keep M secret). To be a useful digital signature scheme we have to be able to associate that proof of knowledge with some known message. To make this happen we have to tweak the protocol yet again.
At this point the story gets a little more complicated because there are two ways to do this final tweak. There's the standard way, and there's the Right Way. It is extremely rare for a deeply flawed standard to get established in cryptography, but in the case of elliptic curve digital signatures it has happened. So I'll start with the standard way, explain what the flaw is, and then explain the Right Way.
To bind a signature to a document in the standard way we first generate a secure hash of the document, which we will call H. As before, we also choose a random number M, which again we must keep secret. We compute R just as before, but change the computation of S like so:
S = (H + SK*R)/M
The signature is the pair (R,S) as before. The verification step also gets tweaked. To verify, we compute:
XC((BP*H + PK*R)/S)
and verify that this is equal to R as before. You can verify for yourself that this works by crunching through the algebra.
The flaw in this scheme is that not only must M be kept secret, but you also have to be very careful never to use the same random M to sign two different messages. If you have signatures for two different messages generated with the same SK and M values, then you can easily recover SK, the secret key. This is how the Sony Playstation was hacked.
The way to fix this problem is not to use a random M, but instead to use a hash of the message being signed plus the secret key as M. This insures that different messages will never be signed with the same M value. This brilliant trick was invented by Daniel J. Bernstein, commonly known by his initials, DJB. [UPDATE: I was mistaken, this trick was not invented by DJB, it was invented by George Barwood and John Wigley in 1997. Thanks to pbsd for pointing that out.]
I really wanted to write about a particular elliptic curve invented by DJB called Curve25519, but I'm running out of steam and I want to post this today, so that will have to wait. Sorry about all the cliffhangers.