import hashlib, datetime, os
l1 = [hashlib.md5(str(i)).hexdigest() for i in range(3000)]
l2 = [x for x in l1]
l2.sort()
os.mkdir('test')
def test(l):
t0 = datetime.datetime.now()
for x in l: open('test/%s' % x, 'w').write('test')
os.system('sync')
os.system('rm test/*')
os.system('sync')
return datetime.datetime.now()-t0
test(l1)
test(l2)
On Linux, both tests run in about the same amount of time (about a second on my machine). But on OS X, test(l1) is seven times slower than test(l2). This is enough to cause real pain when trying to deal with a large repository because Git uses the filesystem as sort of a poor man's database.
If anyone happens to know a fix for this, or how to get Apple's attention, I would be most grateful. I've reported this to Apple Feedback and also their discussion forums but I'm not holding my breath.
[Tongue in cheek] Did you try emailing Steve Jobs? [end Tongue in cheek]
ReplyDeleteDoes this only occur on HFS+ drives? or is it independent of file system (say, Win32)?
Have you tried emailing the Open Darwin mailing lists? They might help you verify that it's an HFS+/Darwin problem.
(I meant Fat32.)
ReplyDelete> Does this only occur on HFS+ drives? or is it independent of file system (say, Win32)?
ReplyDeleteI don't know. I don't have any non-HFS+ partitions. I suppose I could dig out an old drive and try it. I'll try to find some time to do that later today.
It seems to be an HFS thing. Both journaled and non-journaled HFS exhibit this behavior. FAT does not.
ReplyDeleteJournaliing also turns out to be very expensive. Creating and deleting files is more than twice as slow.
Of course, the chances that this will be fixed are zero. The only option is to wait for ZFS. :-(
All we have to do is get the OS X kernel team to migrate to git and when they can't get work done for all the HFS slowness, they will have no choice but to fix it.
ReplyDeleteA more likely fix would be to hack the back end of git to use a real B*-tree database like Berkeley DB instead of relying on the filesystem. (Sounds like a lot of work, but hey, in the FOSS world we're all somewhat guilty as developers, knowing we could technically improve any given open source project since the code is available. Don't you hate that feeling?)
@Jared: Git's point was specifically to not rely upon a database. Now, one may want to try and host the Git repository on a database-based filesystem; you could also create, say, an NTFS-3G loopback-mounted image in a matter of minutes.
ReplyDeleteIf you don't feel like using that one, try any other file system you want; I cited NTFS-3G because it's a user-space file system (if it goes boom, it doesn't crash your OS) that is well supported under Mac OS X.