I'm working on an application that requires a secure password hash. The state of the art is Colin Percival's scrypt but the available code base is not very developer-friendly. Scrypt is published as a self-contained file-encryption utility, and to extract the key-derivation function is not trivial. It's not a lot of work, but it does require a fairly deep understanding of how scrypt actually works under the hood to make sure that you don't screw it up (and crypto code is notoriously easy to screw up even for someone who knows what they are doing). So I decided instead to try bcrypt, which is not as secure as scrypt but is a lot easier to use because it has python bindings and a password-hashing-friendly API.
So I downloaded and installed py-bcrypt, ran a few tests, and everything seemed to be working properly. But then I noticed something odd. The hash produced by py-bcrypt was 60 bytes long:
>>> import bcrypt
>>> bcrypt.hashpw('x', gensalt())
Let's deconstruct that. The format of the bcrypt hash is:
1. A 7-byte header ("$2a$12$") identifying this is a bcrypt hash, followed by...
2. A 22-byte base-64 encoded salt ("'w6IdiZTAckGirKaH8LU8VO") which decodes to a 128-bit binary salt value, followed by...
3. A 31-byte base-64 encoded hash ("xEvP97cFLEW5ePVJzhZilSa5c.V/uMK") which is supposed to decode to a 192-bit hash.
Except that it doesn't. 31 base64 encoded bytes only yield 184 binary bits. One byte of our hash has gone missing. [NOTE: this is corrected from an earlier version where I had two bytes missing. Those damn off-by-one errors :-) ]
OK, so maybe someone accidentally introduced an off-by-one error into the python wrapper. Except that the problem is not in the python wrapper. You can find bcrypt test vectors on the web, and they are all 60-byte strings.
It gets weirder.
The official bcrypt paper says (and other accounts corroborate) that bcrypt is limited to hashing 55-byte-long passwords. But empirically, py-bcrypt uses up to 72 bytes:
>>> hashpw('x'*71, s)
>>> hashpw('x'*72, s)
>>> hashpw('x'*73, s)
That is a very big discrepancy between the actual behavior of the code and the description given in the literature. It's vastly too big a discrepancy to be explainable by a simple inadvertent bug.
Now, some people might say I'm being excessively paranoid, but I don't think so. The higher the stakes in the internet security game get, the more incentive there is for attackers to try all kinds of sneaky and nefarious tricks to introduce weaknesses into people's defenses, and one of the easiest ways to do that is to publish some plausible-looking open-source security code that actually has a hidden weakness built in to it and hope that nobody notices. So IMHO it is prudent to raise at least a yellow flag any time the actual behavior of security code deviates from its peer-reviewed specification. When it comes to security, a certain level of paranoia can be prudent.
I sent an email to the author of py-bcrypt asking about this but didn't get a response. If anyone who knows their way around crypto code can shed some light on this I would be very grateful.
[UPDATE: My general level of paranoia has been at least partially vindcated]
[UPDATE2: The discrepancies have apparently been cleared up]