Searching for the Moon

Shannon Clark's rambles and conversations on food, geeks, San Francisco and occasionally economics

Heritrix – Home Page

Posted by shannonclark on January 7, 2004

Heritrix – Home Page

Open source crawler used by the Internet Archive. Looks likely to be very useful, going to investigate further but certainly I can see many uses for a good, well written (and well behaving) web page crawler/archiver. Especially as a tool to help with my other AI research (i.e. didn’t really want to write a crawler myself, but I do need a large archive of websites/pages for much of what my AI research leads to).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: