Monday, March 09, 2009

Can OS X really suck this badly?

I was playing around with Git because that seems to be what the cool kids are doing nowadays, and stumbled onto what seems to be a rather nasty bug in OS X. Apparently, creating a lot of files with names in a random lexical order is reeeeeaaallllyyyy slllooooowwwww. Here's a demo (in Python):

import hashlib, datetime, os
l1 = [hashlib.md5(str(i)).hexdigest() for i in range(3000)]
l2 = [x for x in l1]


def test(l):
t0 =
for x in l: open('test/%s' % x, 'w').write('test')
os.system('rm test/*')


On Linux, both tests run in about the same amount of time (about a second on my machine). But on OS X, test(l1) is seven times slower than test(l2). This is enough to cause real pain when trying to deal with a large repository because Git uses the filesystem as sort of a poor man's database.

If anyone happens to know a fix for this, or how to get Apple's attention, I would be most grateful. I've reported this to Apple Feedback and also their discussion forums but I'm not holding my breath.


Jared said...

[Tongue in cheek] Did you try emailing Steve Jobs? [end Tongue in cheek]

Does this only occur on HFS+ drives? or is it independent of file system (say, Win32)?

Have you tried emailing the Open Darwin mailing lists? They might help you verify that it's an HFS+/Darwin problem.

Jared said...

(I meant Fat32.)

Ron said...

> Does this only occur on HFS+ drives? or is it independent of file system (say, Win32)?

I don't know. I don't have any non-HFS+ partitions. I suppose I could dig out an old drive and try it. I'll try to find some time to do that later today.

Ron said...

It seems to be an HFS thing. Both journaled and non-journaled HFS exhibit this behavior. FAT does not.

Journaliing also turns out to be very expensive. Creating and deleting files is more than twice as slow.

Of course, the chances that this will be fixed are zero. The only option is to wait for ZFS. :-(

Jared said...

All we have to do is get the OS X kernel team to migrate to git and when they can't get work done for all the HFS slowness, they will have no choice but to fix it.

A more likely fix would be to hack the back end of git to use a real B*-tree database like Berkeley DB instead of relying on the filesystem. (Sounds like a lot of work, but hey, in the FOSS world we're all somewhat guilty as developers, knowing we could technically improve any given open source project since the code is available. Don't you hate that feeling?)

Mitch074 said...

@Jared: Git's point was specifically to not rely upon a database. Now, one may want to try and host the Git repository on a database-based filesystem; you could also create, say, an NTFS-3G loopback-mounted image in a matter of minutes.

If you don't feel like using that one, try any other file system you want; I cited NTFS-3G because it's a user-space file system (if it goes boom, it doesn't crash your OS) that is well supported under Mac OS X.