Over the past week, I’ve been grappling with a question that on its surface seems relatively pedestrian, but that I think offers a key structure that can guide my research: how can I effectively track and evaluate the numerous pieces of software and services that a repository might use as part of a trustworthy system to accession, preserve, manage and provide access to electronic records? As I was doing this, I spent a considerable amount of mulling over the Tufts reports that I commented on last week, in particular Eliot and Kevin’s point that they wished they had tracked the various preservation requirements that they developed for a university context ( expressed in OAIS sections and subsections) using a database.
As I was thinking things through (and getting a little more confused each time I did so), I finally got it into my head that each of the many preservation ‘requirements’ that a repository might need to implement was really an obligation that demanded specific actions by a particular agent (such as an institution, a human, a computer application, or an element in the preservation infrastructure). A rationale for each obligation might be found in one or more standards, best practices documents, or guidelines (e.g. citations). In order to show that each obgligation has been satisfied, a particular agent undertakes or fulfills (with a certain degrees of obligation) one or more activities or events (e.g ‘actions’). Certain of these actions may be facilitated by the use of particular pieces of hardware, software, services or infrastructure (e.g. tools), and each action may generate, modify or make use of various other resources (such as reports, logs, etc).
It seemed to me that the relationships between obligations, agents, activities, tools and resources, would be best tracked in a database. Once in a database format, the various requirements, activities and software entries could be added, deleted, updated, ordered, reordered, and searched in a public format. As additional entities are identified, they could also be defined as entities added to the conceptual model.. For example, one might define a workflows entity to define and order the specific actions needed to complete a larger task.
So, as a first step toward implementing this in a database format, I spent some time learning about entity relationship modeling and resharpening my understanding of database design tools. In the process, I downloaded and used the excellent MySQL Workbench and used it to design a proposed entity relationship model/database diagram.
Regardless of whether or not I eventually program a web-accessible a database to track information in the way I have designed this model, the modeling exercise helped me to think through the various relationships and issues involved in dealing with the whole range of e-records works. It makes me glad that I finally learned something about entity relationship modelling; I can see why many software developers use it before beginning a software project. It really forces a person to think about how a system might best work–or where potential pitfalls might arise in the subsequent development process.
Over time, I hope to develop this model into a database. Such a database would provide useful way to provide information about practical tools that archivists might use. It would offer not only information about requirements in an abstract sense, but would link them to specific actions and to tools that might help accomplish such actions. Over time, such a database could grow through community involvement, the addition of evaluative information, and other features. I’ll be sure to post additional information as I work on the database over the upcoming weeks, in the meantime I’d be interested in any feedback as to whether such a database might be useful and whether my proposed model makes sense.
Over the past week and a half, I spent a good amount of time reading and grappling with the Tufts/Yale “Fedora and the Preservation of University Records Project” Reports. I am very glad I did, and I think the reports deserve a lot more attention than, to my knowledge, they have previously received. There are several reasons why I like them so much.
First, they challenged some of my own assumptions, and I hope that after having read them I will avoid making some very big mistakes over the next several months. Electronic records are a new area for me, and like many archivists, I would very much like to find an easy approach to e-records in general and my research project in particular. Of course I know how complex the issues are, but I deliberately conceptualized my research proposal in a very simple (possibly even naive) fashion: I hoping that by doing software evaluation and working with real live records, I could somehow assemble a soft of jackknife that I could pull out of pocket to at least bring myself and others in the profession several steps closer to solving e-records problems. Similarly, Eliot Wilczek and Kevin Glick (co-PI’s on the Tufts/Yale project thought they could accession some records into Fedora to evaluate whether it could be used as a good preservation enviroment for e-records. But during their project, they discovered that (not only did Fedora 2.1 have many shortcomings, but that they were asking the wrong question. As a result, they ended up refocusing their project to develop a true set of evaluation criteria for determining whether an institution (comprised of its people, infrastructure and applications) can ingest, maintain, and provide access to electronic records.
That is a very, very big task and one I had (belatedly) come to think I was going to need to complete on my own–at least until I read thier report. So, I am indebted to them for saving me a lot of work!
That brings me to the second reason I like their project so much: their reports provide a very useful framework that can be used not only to develop insittution-specific workflows and project plans, but (more critically) to evaluate applications as a part of those workflows/plans. In other words, I feel just a little more able to grapple with the whole range of activities, software and hardware that involved in e-records appraisal, preservation, management and access than I did before I read about their project. Specifically, the report provides at least three very valuable things:
- Requirements for Trustworthy Recordkeeeping Systems and the Preservation of Electronic Records in a University Setting. Report 5.1. The Records Preservation Section provides an amalgamation and crosswalk of requirements from numerous other standards*, conceived and executed within the context of 34 subsections of the OAIS reference model. For each subsection (e.g. “Receive Submission”) they list the features, behaviors and qulities that an Institution,, the institutions’s infrastructure, a Jurdical Person, Natural Person or the Preservation Application must, should or may manifest in order to ensure the trustworthy preservation of authentic electronic records. While the list is daunting, it is less so that others I have seen, and doubly useful in that each requirement is mapped to corresponding requirements for other standards footnoted below.
- Ingest Guide. Report 2.1. The ingest guide describes, a detailed set of (non-linear) activities that a repository would need to undertake in order to ensure a trustworthy ingest process that would allow users to reach a reasonable conclusion that materials maintained in a repository are authentic, i.e. that they provide trustworthy evidence of the activities they document. The guide is split into two discrete activities “Negotiate Submission Agreement” and “Transfer and Validate Records” For each subsection (e.g. “establish relationships” or “Package Records in a SIP”, a decision tree is provided, along with a brief description of the step and a list of any datapoints or reports that the step either requires as an input or produces as an output. (For example, the step A 1.3 “Identify Producer” uses accession logs, activity logs and (possibly) producer records as input and modifies the producer entry as output.) Helpfully, an appendix
- Maintain Guide. Report 3.1. The “Maintain Guide” focuses on one part of the overall system of policies, infrastructure, people and tools necessary to deliver records to user in a way the preserved their continuity and ‘recordness’: the technologies and procedures needed to implement a preservation repository. It lists a series of regular and irregular high-level events that the preservation environment needs to be able to complete (such as verifying records components or accessioning digital objects) that the application and procedures need to support in order to preserve both the record components as well as the archive information packet (AIP and its associated preservation description information).
Even through the reports are complex, they make a possible (if still arduous) task to define a set of applications and services that can manage the entirety of a e-records appraisal, submission, ingest, preservation, management, and dissemination functions.
In this respect, my project is at a critical juncture, because while I have been reading the Tufts/Yale reports, I’ve also been slowly building a list of applications to examine, test and evaluate. But it only makes sense to test and evaluate them once they have cross referenced against a generic workflow and set of requirements (such as the ingest activities or maintenance events in the Tuft/Yale reports)–which is the task I’ll be working on over the next week or so, hopefully in database form, since certain application may cover multiple activities, and certain activities may require more than one application.
* Among these I’d include documents such as the Indiana University Functional Requirements for RecordKeeping Systems, InterPARES I Projects, “Requirements for Assessing and Maintaining the Authenticity of Electronic Records”, ISO 15489-1, MoReq2 and MoReq1, the Public Records Office, Functional Requirements for Electronic Records Management (which was superceded by MoReq), the UCSD’s Preserving the Electronic Records Stored in a Records Management Application (PERM project results), the U.S. Department of Defense 5015.2 standard and the Pittsburgh Functional Requirements for Evidence in Recordkeeping.
Last year, when I was serving on my Library’s Executive Committee, we briefly considered a request from a junior faculty member that we run workshops on the question “How do pick a research topic?”
It was a bit disarming to hear that someone working toward tenure needed direction on such a basic point, but it is still a good question. In the past, when I’ve been wanting to do some research, I’ve always tried to think of a practical problem that I or my colleages are having, then turn that problem into a formal question. Of course, some questions are too big and some questions are too little. But some problems are just right.
In the past, I’ve thought a “just right” problem is on that I can express as a question I can investigate over six months, working about 15 hours a week. Now that I’m on sabbatical, I have a bit more time to do reserach. Neverthless, I need some pretty clear limits since I’m dealing with a very complex area (electronic records), in which a huge number of people that are smarter than me are doing excellent work.
So here is my plan:
- Formulate research question: “What current tools, methods and software are most effective in helping archivists at under resourced insittutions identify, arrange, preserve and provide access to born-digital records that have been donated to a repository at the end of their period of active use?” (done, Sept 2009).
- Conduct literature review and software search, attend training events regarding digital preservation, and develop lists of articles, software, tools and methods in the resources section of this blog (ongoing through November 2009).
- Assemble 4 sets of e-records typical of those that might need to be accessioned, arranged, preserved and provided for access at a university archives or other under-resourced repository (3/4 Done).
- a) backlog of existing ‘one-off’ e-records accessions held by the University of Illinois Archives and Dundee ARMMS.
- b) Email of Paul Lauterbur, Nobel prize winning chemist.
- b) Office files of American Library Association’s Office of Intellectual Freedom and
- d) set still to be identified; likely a non profit organization or a faculty member at University of Dundee that is using participatory software (e.g. wikis, blogs, annoation/commenting systems, community image galleries, etc.)
- Develop simplified e-records processing workflow (based on Tufts/Yale project’s Requirements for Trustworthy Recordkeeping and Preservation, Ingest Guide, and Maintain Guide, as well as other resources). (October 2009)
- Match specific pieces of software to draft e-records processing workflow; identify software gaps. (October-November 2009)
- Develop sofware/method evaluation criteria, which will use a two phases process (Oct-Nov. 2009):
- Brief comparison of program attributes to processing workflow/needs assessment.
- In depth analysis of ‘top candidates’
- Use evaluation criteria to narrow complete list of software to a subset that will be evaluated in a formal test of software using live e-records. (early December 2009)
- Process e-records listed in step 3 using processing workflow, recording numeric evaluation and evaluative comments for each software application or method in subset, for its usefulness in working with defined record types (images, documents, email, websites, etc). (December-January)
- Write formal evaluation paper summarizing methodology and results of my evaluation. (February 2010).
- Develop recommended list of tools; contribute to software development projects to assemble toolkit to facilitate e-records work at ‘under-resourced’ institutions. (March-May 2010.)