Installing OAIS Software: Archivematica

On January 15, 2010, in Research, Software Reviews, by Chris Prom

Over the past week, I’ve been in Cardiff, Wales, at a Forum for Fulbright fellows and scholars.  If I get some time this weekend, I may post a few thoughts about it and/or some reflections on Scotland and my Fulbright experience to date. In the meantime, I’d like to update you on my adventures installing software  for undertaking preservation actions within an OAIS environment.

For those of you who missed past postings, the tools I am evaluating  wrap together a variety of open source tools to help archives with many aspects of the ingest, storage, and access process.  So far, I’ve reviewed RODA and also DAITSS.   Both of them, at least in their current forms, are difficult for anyone without server admin experience  to install.  Any archivist would need significant support to get them running.  I may also try my hand at ISLANDORA (which looks like it would take even more work), but that may not prove necessary since I have been having so much good luck with Archivematica.

Unlike the other software, which is server based, Archivematica is a virutal appliance and runs inside VirtualBox or another virtualization engine that supports the open virtualization format (such as VMWare).  It uses file based storage, so it can be implemented within any existing file storage systems that are available or can be made available on the host computer.  Until now, this project has been deliberately flying under the radar, although I’ve known about is since past fall, when the project manager, Peter van Garderen contacted me.

Based on my initial experiences, Archivematica offers a credible, thoughtful, and I believe supportable model for facilitating archival work with electronic records.  In fact, I like this project I’ve chosen to become directly involved in development and will begin contributing code to it over the next few weeks.

Here’s why I like the  project so much:

Continue reading »

Tagged with:  

Installing OAIS Software: DAITSS

On January 7, 2010, in Research, by Chris Prom

As I noted a few days ago, I have been trying to install and configure various pieces of software of use in and OAIS system, in an effort to discern whether it would be possible for an archivist to begin using them, even if they have very limited tech support. I spent a bit of additional time with RODA, and made some progress, which I’ll write about later. In the meantime, I thought I’d record my thoughts regarding the DAITSS software (Dark Archive in the Sunshine State). If you are not familiar with it, DAITSS is intended to run (as its name implies) as a non-public storage system, complaint with the general principles of the OAIS reference model. Like RODA and Archivematica, it includes (or at least allows for the integration of) various preservation actions, such as conversion of files from one format to another, using some of the same tools that are integrated into the other applications, such as ImageMagick.

Based on my testing, implementing DAITSS would be beyond the capability of most archivists. I found installation more difficult than with RODA, and gave up quite a bit earlier in the process, mainly because the installation documentation assumed a level of knowledge regarding server configuration that is far beyond that of any ‘power user’ or a typical archivists.

My detailed ranking and testing notes are after the break.

Continue reading »

Tagged with:  

Over the past week, I’ve been grappling with a question that on its surface seems relatively pedestrian, but that I think offers a key structure that can guide my research:  how can I effectively track and evaluate the numerous pieces of software and services that a repository might use as part of a trustworthy system to accession, preserve, manage and provide access to electronic records?  As I was doing this, I spent a considerable amount of mulling over the Tufts reports that I commented on last week, in particular Eliot and Kevin’s point that they wished they had tracked the various preservation requirements that they developed for a university context ( expressed in OAIS sections and subsections) using a database.

As I was thinking things through (and getting a little more confused each time I did so), I finally got it into my head that each of the many preservation ‘requirements’  that a repository might need to implement was really an obligation that demanded specific actions  by a particular agent (such as an institution, a human, a computer application, or an element in the preservation infrastructure).  A rationale for each obligation might be found in one or more standards, best practices documents, or guidelines (e.g. citations).  In order to show that each obgligation has been satisfied, a particular agent undertakes or fulfills (with a certain degrees of obligation) one or more activities or events (e.g ‘actions’).   Certain  of these actions may be facilitated by the use of particular pieces of hardware, software, services or infrastructure (e.g. tools), and each action may generate, modify or make use of various other resources (such as reports, logs, etc).

It seemed to me that the relationships between obligations, agents, activities, tools and resources,  would be best tracked in a database.  Once in a database format, the various requirements, activities and software entries could be added, deleted, updated,  ordered, reordered, and searched in a public format.  As additional entities  are identified, they could also be defined as entities added to the conceptual model..   For example, one might define a workflows entity  to define and order the specific actions needed to complete a larger task.

So, as a first step toward implementing this in a database format, I spent some time learning about entity relationship modeling and resharpening my understanding of database design tools.  In the process, I downloaded and used the excellent MySQL Workbench and used it to design a proposed entity relationship model/database diagram.

Regardless of whether or not I eventually program a web-accessible  a database to track information in the way I have designed this model, the modeling exercise helped me to think through the various relationships and issues involved in dealing with the whole range of e-records works.  It makes me glad that I finally learned something about entity relationship modelling; I can see why many software developers use it before beginning a software project. It really forces a person to think about how a system might best work–or where potential pitfalls might arise in the subsequent development process.

Over time, I hope to develop this model into a database.  Such a database would provide useful way to provide information about practical tools that archivists might use.  It would offer not only information about requirements in an abstract sense, but would link them to specific actions and to tools that might help accomplish such actions.  Over time, such a database could grow through community involvement, the addition of evaluative information, and other features.   I’ll be sure to post additional information as I work on the database over the upcoming weeks, in the meantime I’d be interested in any feedback as to whether such a database might be useful and whether my proposed model makes sense.

Download the entity relationship model/database diagram (pdf)

Tufts/Yale Project Reports

On October 13, 2009, in Best Practices, Research, Software Reviews, by Chris Prom

Over the past week and a half, I spent a good amount of time  reading and grappling with the Tufts/Yale “Fedora and the Preservation of University Records Project” Reports.    I am very glad I did, and I think the reports deserve a lot more attention than, to my knowledge, they have previously received.  There are several reasons why I like them so much.

First, they challenged some of my own assumptions, and I hope that after having read them I will avoid making some very big mistakes over the next several months. Electronic records are a new area for me, and like many archivists, I would very much like to find an easy approach to e-records in general and my research project in particular.  Of course I know how complex the issues are, but I deliberately conceptualized my research proposal in a very simple (possibly even naive) fashion:  I hoping that by doing software evaluation and working with real live records, I could somehow assemble a soft of jackknife that I could pull out of pocket to at least bring myself and others in the profession several steps closer to solving e-records problems.  Similarly, Eliot Wilczek and Kevin Glick (co-PI’s on the Tufts/Yale project thought they could accession some records into Fedora to evaluate whether it could be used as a good preservation enviroment for e-records.  But during their project, they discovered that (not only did Fedora 2.1 have many shortcomings, but that they were asking the wrong question.  As a result, they ended up refocusing their project to develop a true set of evaluation criteria for determining whether an institution (comprised of its people, infrastructure and  applications)  can ingest, maintain, and provide access to electronic records.

That is a very, very big task and one I had (belatedly) come to think I was going to need to complete on my own–at least until I read thier report.  So, I am indebted to them for saving me a lot of work!

That brings me to the second reason I like their project so much: their reports provide a very useful framework that can be used not only to develop insittution-specific workflows and project plans, but (more critically) to evaluate applications as a part of those workflows/plans.  In other words, I feel just a little more able to grapple with the whole range of activities, software and hardware that involved in e-records appraisal, preservation, management and access than I did before I read about their project. Specifically, the report provides at least three very valuable things:

  • Requirements for Trustworthy Recordkeeeping Systems and the Preservation of Electronic Records in a University Setting. Report 5.1. The Records Preservation Section provides an amalgamation and crosswalk of requirements  from numerous other standards*, conceived and executed within the context of 34 subsections of the OAIS reference model.  For each subsection (e.g. “Receive Submission”) they list the features, behaviors and qulities that an Institution,, the institutions’s infrastructure, a Jurdical Person, Natural Person or the Preservation Application must, should or may manifest in order to ensure the trustworthy preservation of authentic electronic records.  While the list is daunting, it is less so that others I have seen, and doubly useful in that each requirement is mapped to corresponding requirements for other standards footnoted below.
  • Ingest Guide. Report 2.1. The ingest guide describes, a detailed set of (non-linear) activities that a repository would need to undertake in order to ensure a trustworthy ingest process that would allow users to reach a reasonable conclusion that materials maintained in a repository are authentic, i.e. that they provide trustworthy evidence of the activities they document.  The guide is split into two discrete activities “Negotiate Submission Agreement” and “Transfer and Validate Records”  For each subsection (e.g. “establish relationships” or “Package Records in a SIP”, a decision tree is provided, along with a brief description of the step and a list of any datapoints or reports that the step either requires as an input or produces as an output.  (For example, the step A 1.3 “Identify Producer” uses accession logs, activity logs and (possibly) producer records as input and modifies the producer entry as output.)  Helpfully, an appendix
  • Maintain Guide. Report 3.1.  The “Maintain Guide” focuses on one part of the overall system of policies, infrastructure, people and tools necessary to deliver records to user in a way the preserved their continuity and ‘recordness’:  the technologies and procedures needed to implement a preservation repository.  It lists a series of regular and irregular high-level events that the preservation environment needs to be able to complete (such as verifying records components or accessioning digital objects) that the application and procedures need to support in order to preserve both the record components as well as the archive information packet (AIP and its associated preservation description information).

Even through the reports are complex, they make a possible  (if still arduous) task to define a set of applications and services that can manage the entirety of a e-records appraisal, submission, ingest, preservation, management, and dissemination functions.

In this respect, my project is at a critical juncture, because while I have been reading the Tufts/Yale reports, I’ve also been slowly building a list of applications to examine, test and evaluate.  But it only makes sense to test and evaluate them once they have cross referenced against a generic workflow and set of requirements (such as the ingest activities or maintenance events in the Tuft/Yale reports)–which is the task I’ll be working on over the next week or so, hopefully in database form, since certain application may cover multiple activities, and certain activities may require more than one application.

*  Among these I’d include documents such as the Indiana University Functional Requirements for RecordKeeping Systems, InterPARES I Projects, “Requirements for Assessing and Maintaining the Authenticity of Electronic Records”, ISO 15489-1MoReq2 and  MoReq1, the Public Records Office, Functional Requirements for Electronic Records Management (which was superceded by MoReq), the UCSD’s Preserving the Electronic Records Stored in a Records Management Application (PERM project results), the U.S. Department of Defense 5015.2 standard and the Pittsburgh Functional Requirements for Evidence in Recordkeeping.

Tagged with: