Installing OAIS Software: Archivematica

On January 15, 2010, in Research, Software Reviews, by Chris Prom

Over the past week, I’ve been in Cardiff, Wales, at a Forum for Fulbright fellows and scholars.  If I get some time this weekend, I may post a few thoughts about it and/or some reflections on Scotland and my Fulbright experience to date. In the meantime, I’d like to update you on my adventures installing software  for undertaking preservation actions within an OAIS environment.

For those of you who missed past postings, the tools I am evaluating  wrap together a variety of open source tools to help archives with many aspects of the ingest, storage, and access process.  So far, I’ve reviewed RODA and also DAITSS.   Both of them, at least in their current forms, are difficult for anyone without server admin experience  to install.  Any archivist would need significant support to get them running.  I may also try my hand at ISLANDORA (which looks like it would take even more work), but that may not prove necessary since I have been having so much good luck with Archivematica.

Unlike the other software, which is server based, Archivematica is a virutal appliance and runs inside VirtualBox or another virtualization engine that supports the open virtualization format (such as VMWare).  It uses file based storage, so it can be implemented within any existing file storage systems that are available or can be made available on the host computer.  Until now, this project has been deliberately flying under the radar, although I’ve known about is since past fall, when the project manager, Peter van Garderen contacted me.

Based on my initial experiences, Archivematica offers a credible, thoughtful, and I believe supportable model for facilitating archival work with electronic records.  In fact, I like this project I’ve chosen to become directly involved in development and will begin contributing code to it over the next few weeks.

Here’s why I like the  project so much:

  • The software is currently in pre-alpha format, but even so is is very easy to install, use, and customize (see below).
  • Technical simplicity/elegance:
    1. The development framework is extremely straightforward: the are developing the system as a Ubuntu-based virtual appliance (I really wish I had thought of this idea), tying together code using Python scripts.   For that reason, the project can leverage code from the most successful open source projects to date, such as  Linux/Ubuntu/Gnome—a vast pool of existing, stable code.
    2. Since it is Linux based, a wide range of programmers will be able to assist with the project and or local customization, where needs dictate.
    3. It can be used in addition to or alongside preservation action tools that exist on the host operating systems (such as Windows XP or Mac Snow Leopard).  While many preservation actions can be completed using open source tools (like the New Zealand metadata extractor, ImageMagick, JHOVE, Jacksum, etc), there are many tools that cannot do the heavy lifting that needs to be done and the ability to have access to multiple operating systems is a big plus.
    4. Since it can be installed locally, it will be easier to extend and or modify than will a JAVA server-based application,  such as RODA or DAITSS.  In addition, when each of the software components that comprise the virtual machine are updated by their own code contributors, installing the new software will automatically take place, like a Windows update, using the Ubuntu software updater.
    5. Potential for flexible local implementation.  While it may ultimately make sense to have it installed and scaled as a separate server, or even spread among several servers, for large implementations.   At the same time, archivists who do not have access to extensive systems support can get started with it and used it alongside whatever current infrastructure that they have, simply by running it as a client operating system (inside VirtualBox or VMware) on a host machine such as windows.  (In my experience, it is much easier to talk IT folks into giving you admin rights to a machine or failing that, to have them install one program on a desktop, than it is to get them to set up a new server.)
    6. Can be used with existing access systems by writing a relatively straightforward  Python script to push metadata and objects from the archival store into a description access system such as Archon, Archivist Toolkit, or whatever you use locally.  Also comes pre-bundled with Qubit-based access system for those who lack an existing access/description metadata system.
  • Excellent potential for sustainability:
    1. Although certain OAIS functions are not supported in the pre-release, the project team is following an extremely open development model.  They have fully sketched out their development roadmap using a series of UML diagrams and wiki entries.  They are open to feedback and code contributions and are constantly revising their development plans, matching up software products to the OAIS reference model requirements and other prior work, such as the Tufts/Yale ingest guidelines.
    2. The project team previous managed another successful open-source archival development project.  They are not wet behind the ears in terms of development skills or project management.
    3. They are firmly committed to the open source development model (anyone who examines the project wiki can verify this; the fact they are distributing software at such an early stage is a very positive signal.)
    4. The project has support from several sponsors, including UNESCO’s memory of the world program, as well as City of Vancouver Archives, the International Monetary Fund Archives. It is not linked to a particular institution and or a single source of funding.
    5. The project sponsors are clearly about a business model that can provide additional services around the software, without, so far as I can tell, designing the software to be so complex and under-documented that they will be the only ones capable of supporting it, unless you reverse engineer the entire system.  I think this method of software development is a lot more supportable than open source projects which begin as grant projects in universities, then kick about for a commercial partner.

Here are my detailed installation ‘scores’ and notes:

  • Cohesion and completeness of documentation: 4. The installation manual is very complete and easy to use. I ran into only one question while using the documentation to install it, and I quickly resolved the issue withing 5 minutes.
  • Required software dependencies: 4.  As long as you have the ability to get VirtualBox installed (requiring admin rights to your computer), you can use this tool
  • Level of  technical knowledge and system admin authority required: 3.  VirtualBox and other virtualization engines are a bit of a power tool.  You need to read certain part of the VBox manual carefully in order to develop and install an operating system from scratch.  However, since Archivematica is distributed as a pre-built appliance, you are saved even the minor trouble doing that. It does pay to understand some of this information, since the tool works best if certain configuration options, such as memory available to the system can be adjusted.  Once it is installed, you do need to set up a few shared folder, so you can access files both from the host and client operating systems, which requires opening up the terminal.  however, the method is very clearly described in the documentation and anyone who can read should be able to follow it.
  • Time required to install: 4. I spent about 1 hour downloading and installing VirtualBox then setting up the virtual machine (most of this time was spend reading the manual for VBox, which was not strictly necessary).  In addition, I spent another hour configuring the machine (i.e. setting up shared folders according to the installation instructions).
  • Flexibility of installation options: 3.  The development model allows for quite a bit of flexibility.  Since it is a virtual machine, there is a wide range of extensibility either in the machine itself on on the host computer.  For example, after I had Archivematica installed an running, I was able to install a bulk file renaming tool (Thunar), in under 2 minutes.  With a bit of extra effort, I could capture metadata concerning those changes back to the AIP that Archivematica generates—then contribute that code back to the project.  Similarly, the tool could potentially have access to metadata regarding actions taken on the host assuming that the metadata generated by those applications is made available via a shred folder.  However, I am not sure whether in the future, it will be implemented as a Service Oriented Architecture, so that certain components of the system could be handled by different servers.  If so, that might make it operate more quickly when needing to do preservation actions for a large number of digital objects.

Bottom line installation and configuration score: 18/20

* Please note, I updated the scores on Feb 1, 2010 in order to reflect the fact that only 20 points of the final score apply toward the final evaluation, given my revised evaluation criteria.

Tagged with:  
  • matt.veatch

    Your assessments of the various open source digital repository options are exceptionally useful. I really appreciate the work you are doing.