Searching for the Moon

Shannon Clark's rambles and conversations on food, geeks, San Francisco and occasionally economics

Heritrix – Home Page

Posted by shannonclark on January 7, 2004

Heritrix – Home Page

Open source crawler used by the Internet Archive. Looks likely to be very useful, going to investigate further but certainly I can see many uses for a good, well written (and well behaving) web page crawler/archiver. Especially as a tool to help with my other AI research (i.e. didn’t really want to write a crawler myself, but I do need a large archive of websites/pages for much of what my AI research leads to).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: