This post will also be available in the next MAC newsletter.

Over the past year, the University of Illinois Archives has implemented the “Practical E-Records Method,” the result of a sabbatical project undertaken during academic year 2009-2010 with support from the US-UK Fulbright Commission. [1]  The project provides recommendations to help small and medium-sized archives make digital curation and digital preservation systematic institutional functions.  To implement these recommendations at the University of Illinois Archives, we tested Archivematica—an open-source, OAIS Reference Model-compliant digital preservation system that can be installed on a desktop computer—as a tool to preserve digital objects.   Because Archivematica is in its alpha stages, working with this system was a way to explore what the system offered in relation to the needs of the University Archives, as well as provide input to developers as they continue to refine the software for production release.

The University Archives uses some dated desktop computer systems and does not have ready access to a virtual server, so we ran Archivematica using Virtual Box on a local machine.  We installed Archivematica as a guest on a host operating system, a desktop computer running Windows XP.  The desktop host allows guest operating systems to run within a virtual environment.  After two technologically challenging and ultimately failed attempts at downloading, installing, and working Archivematica .6 and .7 using Virtual Box, the University Archives designated a newer computer exclusively to Archivematica, thus eliminating previous space and compatibility issues.

The Virtual Box installation went well, but the download and installation of the Archivematica virtual appliance format was a bit more challenging.  Essentially, Archivematica is an Ubuntu (Linux) distribution with extensions to support digital preservation actions using a web-based preservation dashboard. The user manual, unavailable during the Archivematica .6 installation, was extremely helpful when navigating the demo.  We encountered no problems until importing files into the virtual appliance, since Michael Bennett from the University of Connecticut created a step-by-step illustrated guide for installing Archivematica, freely available on-line.[2]

Upon successful installation and setup, we created a template to evaluate Archivematica using a variety of electronic record formats.  Because the University Archives aims to preserve digital content that is understandable and usable in the long term, my evaluation centered on how easily Archivematica fit into the day-to-day work flow associated with processing mixed digital media.  The three areas of criteria used in our evaluation included Performance/Reliability, System Design, and Output.  We evaluated Archivematica against nine record series with the evaluation matrix located in Appendix 1.  In order to iteratively test the extent of Archivematica’s capabilities, we started with elementary electronic records such as Microsoft Office documents and PDFs, then moved to complicated, larger file types, such as audio-visual objects.

Results indicated that when Archivematica worked, it addressed all of the concerns about correct file extension identification.  It performed batch migration to the approved best-practices preservation, access and normalization formats, and it preserved the original file.  However, a file failure led to the ingest failure of the entire corresponding folder.  Once an error caused the system to stop, the entire folder automatically failed to process.  The failure rate of files remained high throughout, and the system failed for inconsequential or difficult to identify reasons, such as “no hash found on line 6.”  This severely hampered our ability to ingest information in a timely manner; in order to circumvent the system, each file needed to be ingested separately—an impossible solution given the large volume of electronic records that needed processing.

The system identified errors to minimize corruption, but in the process it also inhibited the ingest of clean files.  It continued to ingest other folders while isolating the error, but the failed component could not be brought back into the system.  In addition, Archivematica stalled more than once on a component of the ingest process.  During this time, it gave no explanation for the delay, which lasted from several hours to a day.  From our observation, it was neither file size nor type that caused the time lag.  In most cases, we eventually stopped the ingest process for lack of progress and information about the delay.

Given the immediate needs of the University Archives, the developing state of Archivematica, and other digital preservation development work taking place within the University Library, we chose not to incorporate the current version into our electronic records work flow.  However, during evaluation we noted several elements that would greatly improve usability for future versions. The open development process that Archivematica developers use ensures they receive direct feedback about institutional needs and how Archivematica does or does not address these needs.

My most consequential recommendation for future versions is that archivists be provided the ability to access and control files at every step of the ingest process and to have fine-grained control over individual preservation actions.  This would allow archivists to recover failed files and assure other files continue in the ingest process.  Currently, when Archivematica fails, it fails badly.  By improving the way Archivematica handles errors, it will be easier to use within the day-to-day workflow of an archives.

Once some of the issues are worked out in upcoming versions, Archivematica will be useful for smaller institutions that have less IT support than a large research library, such as the University of Illinois.  However, three concerns remain.  First, many smaller institutions may lack the hardware or the technological capability to support the system. Archivematica needs newer computers with large amounts of RAM to properly function, and smaller archives may not possess the updated hardware or the resources to invest in updated equipment.  Second, the installation process is not user friendly.   Unlike production software installations that require fewer actions from the user, Archivematica requires a familiarity with fairly complex operations, such as working from the Linux shell/command line.  Though Michael Bennett’s instructions helped immensely, the installation process would be best completed by technology-savvy archivists.  Finally, the software is best run from a dedicated virtual server, to which many institutions may not have access.  In any case, running Archivematica on a dedicated virtual machine requires significant help from IT professionals.

The motivation and intent behind Archivematica is admirable.  However, the technological ability needed to successfully install and run this system is currently beyond the people who might benefit most.  What is needed is a collaborative effort between archivists and others to bridge the divide between this preservation system and its potential users.  A sustainable model that supports preservation services using Archivematica and other open source software, such as Archon or Archivists’ Toolkit, would facilitate the provision of these tools to archives that might not be able to sustain or use these services through their own effort.   For example, consortia and developers may wish to consider developing a hosted processing and storage environment that institutions could contract on an annual basis.  Such a service would relieve institutions of technology maintenance, but still allow them to contribute content to a shared repository and discovery system, while participating in the shared development process.

 

RANK: (M)ANDITORY
(D)ESIRABLE
(O)PTIONAL
YES/NOCOMMENTS
PERFORMANCE/RELIABILITY
Does the system ingest all file types?
If not, does the system identify reasons for ingest failure?
Does the series ingest all file sizes?
Does the program correctly identify file types?
Do non-open file formats migrate?
Do migrated files adequately represent significant characteristics of the original?
Is the program reliable in the ingest process of this series?

SYSTEM DESIGN
Does the system identify errors/failures before any data can be corrupted?
Does the system isolate errors/failures so it can continue to operate in the presence of the error or failure?
Can the failed component be repaired while the system is running users' applications?
Can the ingest be brought back into the system configuration thus restoring full functionality with mimimum or no interruption?
OUTPUT
Is the XML valid, well-formed and contain relevant preservation metadata?
Does the system successfully ingest the series?
How long does the program take to process all submitted files in the series?
Is the time required to process this series practical for integration into the archives work flow?

 



[1] Christopher J. Prom, Practical E-Records Blog, http://e-records.chrisprom.com.  Accessed August 8, 2011.

[2]Available at: http://digitalcommons.uconn.edu/cgi/viewcontent.cgi?article=1038&context=libr_pubs&sei-redir=1#search=%22michael+bennett+installing+archivematica%22

 

Comments are closed.