Installing OAIS Software: DAITSS

On January 7, 2010, in Research, by Chris Prom

As I noted a few days ago, I have been trying to install and configure various pieces of software of use in and OAIS system, in an effort to discern whether it would be possible for an archivist to begin using them, even if they have very limited tech support. I spent a bit of additional time with RODA, and made some progress, which I’ll write about later. In the meantime, I thought I’d record my thoughts regarding the DAITSS software (Dark Archive in the Sunshine State). If you are not familiar with it, DAITSS is intended to run (as its name implies) as a non-public storage system, complaint with the general principles of the OAIS reference model. Like RODA and Archivematica, it includes (or at least allows for the integration of) various preservation actions, such as conversion of files from one format to another, using some of the same tools that are integrated into the other applications, such as ImageMagick.

Based on my testing, implementing DAITSS would be beyond the capability of most archivists. I found installation more difficult than with RODA, and gave up quite a bit earlier in the process, mainly because the installation documentation assumed a level of knowledge regarding server configuration that is far beyond that of any ‘power user’ or a typical archivists.

My detailed ranking and testing notes are after the break.

Continue reading »

Tagged with:  

Tufts/Yale Project Reports

On October 13, 2009, in Best Practices, Research, Software Reviews, by Chris Prom

Over the past week and a half, I spent a good amount of time  reading and grappling with the Tufts/Yale “Fedora and the Preservation of University Records Project” Reports.    I am very glad I did, and I think the reports deserve a lot more attention than, to my knowledge, they have previously received.  There are several reasons why I like them so much.

First, they challenged some of my own assumptions, and I hope that after having read them I will avoid making some very big mistakes over the next several months. Electronic records are a new area for me, and like many archivists, I would very much like to find an easy approach to e-records in general and my research project in particular.  Of course I know how complex the issues are, but I deliberately conceptualized my research proposal in a very simple (possibly even naive) fashion:  I hoping that by doing software evaluation and working with real live records, I could somehow assemble a soft of jackknife that I could pull out of pocket to at least bring myself and others in the profession several steps closer to solving e-records problems.  Similarly, Eliot Wilczek and Kevin Glick (co-PI’s on the Tufts/Yale project thought they could accession some records into Fedora to evaluate whether it could be used as a good preservation enviroment for e-records.  But during their project, they discovered that (not only did Fedora 2.1 have many shortcomings, but that they were asking the wrong question.  As a result, they ended up refocusing their project to develop a true set of evaluation criteria for determining whether an institution (comprised of its people, infrastructure and  applications)  can ingest, maintain, and provide access to electronic records.

That is a very, very big task and one I had (belatedly) come to think I was going to need to complete on my own–at least until I read thier report.  So, I am indebted to them for saving me a lot of work!

That brings me to the second reason I like their project so much: their reports provide a very useful framework that can be used not only to develop insittution-specific workflows and project plans, but (more critically) to evaluate applications as a part of those workflows/plans.  In other words, I feel just a little more able to grapple with the whole range of activities, software and hardware that involved in e-records appraisal, preservation, management and access than I did before I read about their project. Specifically, the report provides at least three very valuable things:

  • Requirements for Trustworthy Recordkeeeping Systems and the Preservation of Electronic Records in a University Setting. Report 5.1. The Records Preservation Section provides an amalgamation and crosswalk of requirements  from numerous other standards*, conceived and executed within the context of 34 subsections of the OAIS reference model.  For each subsection (e.g. “Receive Submission”) they list the features, behaviors and qulities that an Institution,, the institutions’s infrastructure, a Jurdical Person, Natural Person or the Preservation Application must, should or may manifest in order to ensure the trustworthy preservation of authentic electronic records.  While the list is daunting, it is less so that others I have seen, and doubly useful in that each requirement is mapped to corresponding requirements for other standards footnoted below.
  • Ingest Guide. Report 2.1. The ingest guide describes, a detailed set of (non-linear) activities that a repository would need to undertake in order to ensure a trustworthy ingest process that would allow users to reach a reasonable conclusion that materials maintained in a repository are authentic, i.e. that they provide trustworthy evidence of the activities they document.  The guide is split into two discrete activities “Negotiate Submission Agreement” and “Transfer and Validate Records”  For each subsection (e.g. “establish relationships” or “Package Records in a SIP”, a decision tree is provided, along with a brief description of the step and a list of any datapoints or reports that the step either requires as an input or produces as an output.  (For example, the step A 1.3 “Identify Producer” uses accession logs, activity logs and (possibly) producer records as input and modifies the producer entry as output.)  Helpfully, an appendix
  • Maintain Guide. Report 3.1.  The “Maintain Guide” focuses on one part of the overall system of policies, infrastructure, people and tools necessary to deliver records to user in a way the preserved their continuity and ‘recordness’:  the technologies and procedures needed to implement a preservation repository.  It lists a series of regular and irregular high-level events that the preservation environment needs to be able to complete (such as verifying records components or accessioning digital objects) that the application and procedures need to support in order to preserve both the record components as well as the archive information packet (AIP and its associated preservation description information).

Even through the reports are complex, they make a possible  (if still arduous) task to define a set of applications and services that can manage the entirety of a e-records appraisal, submission, ingest, preservation, management, and dissemination functions.

In this respect, my project is at a critical juncture, because while I have been reading the Tufts/Yale reports, I’ve also been slowly building a list of applications to examine, test and evaluate.  But it only makes sense to test and evaluate them once they have cross referenced against a generic workflow and set of requirements (such as the ingest activities or maintenance events in the Tuft/Yale reports)–which is the task I’ll be working on over the next week or so, hopefully in database form, since certain application may cover multiple activities, and certain activities may require more than one application.

*  Among these I’d include documents such as the Indiana University Functional Requirements for RecordKeeping Systems, InterPARES I Projects, “Requirements for Assessing and Maintaining the Authenticity of Electronic Records”, ISO 15489-1MoReq2 and  MoReq1, the Public Records Office, Functional Requirements for Electronic Records Management (which was superceded by MoReq), the UCSD’s Preserving the Electronic Records Stored in a Records Management Application (PERM project results), the U.S. Department of Defense 5015.2 standard and the Pittsburgh Functional Requirements for Evidence in Recordkeeping.

Tagged with:  

Barebones research methodology

On October 1, 2009, in Methods, Research, by Chris Prom

Last year, when I was serving on my Library’s Executive Committee, we briefly considered a request from a junior faculty member that we run workshops on the question “How do pick a research topic?”

It was a bit disarming to hear that someone working toward tenure needed direction on such a basic point, but it is still a good question.  In the past, when I’ve been wanting to do some research, I’ve always tried to think of a practical problem that I or my colleages are having, then turn that problem into a formal question. Of course, some questions are too big and some questions are too little. But some problems are just right.

In the past, I’ve thought a “just right”  problem is on that I can express as  a question I can investigate over six months, working about 15 hours a week.  Now that I’m on sabbatical, I have a bit more time to do reserach.  Neverthless, I need some pretty clear limits since I’m dealing with a very complex area (electronic records), in which a huge number of people that are smarter than me are doing excellent work.

So here is my plan:

  1. Formulate research question:  “What current tools, methods and software are most effective in helping archivists at under resourced insittutions identify, arrange, preserve and provide access to born-digital records that have been donated to a repository at the end of their period of active use?” (done, Sept 2009).
  2. Conduct literature review and software search, attend training events regarding digital preservation, and  develop lists of articles, software, tools and methods in the resources section of this blog (ongoing through November 2009).
  3. Assemble 4 sets of e-records typical of those that might need to be accessioned, arranged, preserved and provided for access at a university archives or other under-resourced repository  (3/4 Done).
    • a) backlog of existing ‘one-off’ e-records accessions held by the University of Illinois Archives and Dundee ARMMS.
    • b) Email of Paul Lauterbur, Nobel prize winning chemist.
    • b) Office files of American Library Association’s Office of Intellectual Freedom and
    • d) set still to be identified; likely a non profit organization or a faculty member at University of Dundee that is using participatory software (e.g. wikis, blogs, annoation/commenting systems, community image galleries, etc.)
  4. Develop simplified e-records processing workflow (based on Tufts/Yale project’s Requirements for Trustworthy Recordkeeping and Preservation, Ingest Guide, and Maintain Guide, as well as other resources). (October 2009)
  5. Match specific pieces of software to draft e-records processing workflow;  identify software gaps. (October-November 2009)
  6. Develop sofware/method evaluation criteria, which will use a two phases process (Oct-Nov. 2009):
    • Brief comparison of program attributes to processing workflow/needs assessment.
    • In depth analysis of ‘top candidates’
  7. Use evaluation criteria to narrow complete list of software to a subset that will be evaluated in a formal test of software using live e-records. (early December 2009)
  8. Process e-records listed in step 3 using processing workflow, recording numeric evaluation and evaluative comments for each software application or method in subset, for its usefulness in working with  defined record types (images, documents, email, websites, etc). (December-January)
  9. Write formal evaluation paper summarizing methodology and results of my evaluation. (February 2010).
  10. Develop recommended list of tools; contribute to software development projects to assemble toolkit to facilitate e-records work at ‘under-resourced’ institutions.  (March-May 2010.)

Digital Dilemmas

On September 18, 2009, in Research, Software Reviews, Uncategorized, by Chris Prom

Yesterday, I participated in a session at the CAIS study session regarding Digital Dilemmas.  The basic purpose was to introduce students to issues they will be studying regarding electronic records and digital information.  Working together, Alan Bell, Susan Thomas, Philip Lord and I led the students through a series of discussions based around this scenario:

The organisation you work for has been involved in recent controversial  plans  that a rich American developer has to redevelop and area of coastline as a golf course/resort.

There have been allegations in the press that undue influence has been brought to bear on local politicians and businesses people to support the development.

You have been approached by the patron of a group opposed to  the development – a group that including university faculty, religious leaders, local councillors, and members of the pubic– who informs you that he has been keeping an electronic news clippings, email, digital photographs, scientific data, and ephemera concerning his group’s efforts to stop the development.  His collection includes  correspondence with senior government ministers.  He kindly offers to burn the files onto several CD’s if you promise that you can take good care of the records.  He also asked if he should ‘do anything special’ to the records before giving them to you.

The students really did well with the question; they identified relevant issues covering the entire range of activities that would need to be pursued in dealing with these records, including donor relations, arrangement, access, preserving authenticity, metadata creation or capture, etc.  They keyed in quite nicely on how these issues need to be addressed differently for digital as vs. analog materials.

What struck me most about the discussion was the fact that the students spent relatively little time talking about digital preservation.  The discussion really drove home (to me at least) the point that we need a range of tools to be effective in working with records.

The discussion was also an interesting counterpoint to the literature I’ve been reading on large digital preservation projects. Obviously, a lot of money, time and effort are being put into projects such as PLANETS, NDIIP, DSpace/Fedora (now DuraSpace) and it is clear that some interesting and useful software to facilitate digital preservation will emerge from these projects.  They are certainly are generating a lot of buzz–as well as a lot of reports, guidelines and (currently under-documented) tools, and frankly, I am glad that people smarter than me are working on the problem.

At the same time, there are problems with pouring so many resouces toward digital preservation.  In practice, it seems to segregate preservation from issues of appraisal, arrangement, description, access, and use, or at least to oversimplify the effect that factors in these other realms have on preservation (and vice versa).  It leaves other important questions if not unasked, at least not asked often enough, and not asked in a way that will drive forward better practices and tool development for the entire range of activities we need to pursue.

To take just a few simple examples, is anyone designing a set of tools that would allow us to complete a concise, comprehensive assessment of the files on an office’s 1 TB networked storage device .  How about to assess a set of 100 unlabeled CD’s, zip disks, thumb drives, laptops and and portable hard drive left behind by an deceased environmental activist?  (Please don’t say you can just use Windows Explorer).  How should we discern the original order of the files and what would you use to preserve the context and relationship files and folders to each other, before, during and after the process of them being loaded into a repository?  What should we use to quickly prepare series-level, folder level, and item-level metadata for digital files?   What format would such informaion be structures so that it can be input easily into a number of different repository systems?   All of these questions, which concern activities that need to take place before items are ingested into an OAIS based system, need serious investigation and themselves could be the subject of multi-strand projects.  Obviously, archival theories and practices developed for paper based records and in research projects concerning e-records should inform such work, but we really just need an effective toolset to use in applying these theories and practices.

Anyway, what I’ve said above is probably not an original point, but for a project like mine, it means I’ll need to concentrate more on assessment of gaps than on developing any kind of silver bullet for dealing with e-records (which is a chimera anyway).  In other words, in the process of working with my sample records, I hope to find  out what software works, what doesn’t, and what is missing–based around a very narrow scenario “What would you do if a donor approaches you wanting to donate 100 gigabytes of electronic files concerning (insert today’s hot topic here.)”

Tagged with: