Perhaps this isn't standard usage, but it's what I need. I've got a test the crawls my project and ends up crawling over 13000 links on the last run. top reports the resident memory usage as 637M. Previously I had a test that had queued up ~50000 links, but it wouldn't finish because around link 40000 or so my box ran out of memory (it's got 2 gig).
I looked into it and it looks like most of the space is taken up by response bodies that tarantula keeps references to since it waits to write everything to disk until the end. I'm working on a patch to make it write the detail pages as they are crawled and store the details for the index page until the end, but I wanted to write this stuff down before I forgot. I'll post the patch herer when I get it cleaned up.