Searching for the Moon

Shannon Clark's rambles and conversations on food, geeks, San Francisco and occasionally economics

Heritrix – Home Page

Posted by shannonclark on January 7, 2004

Heritrix – Home Page

Open source crawler used by the Internet Archive. Looks likely to be very useful, going to investigate further but certainly I can see many uses for a good, well written (and well behaving) web page crawler/archiver. Especially as a tool to help with my other AI research (i.e. didn’t really want to write a crawler myself, but I do need a large archive of websites/pages for much of what my AI research leads to).

Leave a comment