Web Archiving Service Evaluation

On February 1, 2011, in Software Reviews, by Megan Toups

This is the fourth installment in a series of evaluations of website harvesting software on the Practical E-records blog.  The first three installments were reviews of open source software that you can download and install locally—HTTrack, GNU Wget free utility, and Heritrix.  This fourth installment is a review of the Web Archiving Service (WAS) developed by the California Digital Library, which is a fee based service for capturing and storing websites.

Continue reading »

Heritrix Evaluation/Review

On November 17, 2010, in Software Reviews, by Megan Toups

This is the third installment in a series of evaluations of website harvesting software on the Practical E-records blog.  The first two installments were reviews of the HTTrack open source software and the GNU Wget free utility.  This third installment is a review of Heritrix, the Internet Archive’s open source web archiving software.

Continue reading »

GNU Wget Evaluation

On September 12, 2010, in Software Reviews, by Megan Toups

This is the second installment in a series of evaluations of website harvesting software on the Practical E-records blog.  The first installment was an evaluation of the HTTrack open source software and this installment reviews the GNU Wget free utility.

GNU Wget is designed to be used for a variety of different retrievals—HTTP, HTTPS, and FTP— but today I am evaluating it only in light of its use as software for website capture.  According to the GNU Wget website the software can be used on any “UNIX-like operating systems as well as Microsoft Windows.”  Once Wget has been downloaded and installed on your computer, you need to use the command line to work with it.  Regardless of whether you have previous experience using a command line, reading the documentation carefully is important in using Wget because there are a variety of options available for setting the parameters for capturing.  The manual is very heavy with jargon, so plan to spend some time with it in order to better understand the parameters you might use during your web capture.

I first downloaded the software to my computer.  After reading the manual, I opened up a command line interface screen and went to the directory that held Wget.  Once there, I used the following command to run the program, which I will describe in more detail below:

wget -rpxkE -t 20 --limit-rate=100k --wait=2 --directory-prefix=WebsiteCapture2 --level=20 http://www.xyz.org/ -o log1 &

Continue reading »

HTTrack Evaluation

On July 15, 2010, in Research, Software Reviews, by Megan Toups

HTTrack is a free, open source website copier that can be downloaded to your desktop and used to harvest websites. Due to the changing nature of the web, archivists are interested in having a way to take snapshots of websites so that we have a record of what these sites looked like and what information was contained in them. Finding straightforward and cost effective ways of doing this is likely to be an essential part of archival work in the future.

Continue reading »