Over the past week, I’ve been grappling with a question that on its surface seems relatively pedestrian, but that I think offers a key structure that can guide my research:  how can I effectively track and evaluate the numerous pieces of software and services that a repository might use as part of a trustworthy system to accession, preserve, manage and provide access to electronic records?  As I was doing this, I spent a considerable amount of mulling over the Tufts reports that I commented on last week, in particular Eliot and Kevin’s point that they wished they had tracked the various preservation requirements that they developed for a university context ( expressed in OAIS sections and subsections) using a database.

As I was thinking things through (and getting a little more confused each time I did so), I finally got it into my head that each of the many preservation ‘requirements’  that a repository might need to implement was really an obligation that demanded specific actions  by a particular agent (such as an institution, a human, a computer application, or an element in the preservation infrastructure).  A rationale for each obligation might be found in one or more standards, best practices documents, or guidelines (e.g. citations).  In order to show that each obgligation has been satisfied, a particular agent undertakes or fulfills (with a certain degrees of obligation) one or more activities or events (e.g ‘actions’).   Certain  of these actions may be facilitated by the use of particular pieces of hardware, software, services or infrastructure (e.g. tools), and each action may generate, modify or make use of various other resources (such as reports, logs, etc).

It seemed to me that the relationships between obligations, agents, activities, tools and resources,  would be best tracked in a database.  Once in a database format, the various requirements, activities and software entries could be added, deleted, updated,  ordered, reordered, and searched in a public format.  As additional entities  are identified, they could also be defined as entities added to the conceptual model..   For example, one might define a workflows entity  to define and order the specific actions needed to complete a larger task.

So, as a first step toward implementing this in a database format, I spent some time learning about entity relationship modeling and resharpening my understanding of database design tools.  In the process, I downloaded and used the excellent MySQL Workbench and used it to design a proposed entity relationship model/database diagram.

Regardless of whether or not I eventually program a web-accessible  a database to track information in the way I have designed this model, the modeling exercise helped me to think through the various relationships and issues involved in dealing with the whole range of e-records works.  It makes me glad that I finally learned something about entity relationship modelling; I can see why many software developers use it before beginning a software project. It really forces a person to think about how a system might best work–or where potential pitfalls might arise in the subsequent development process.

Over time, I hope to develop this model into a database.  Such a database would provide useful way to provide information about practical tools that archivists might use.  It would offer not only information about requirements in an abstract sense, but would link them to specific actions and to tools that might help accomplish such actions.  Over time, such a database could grow through community involvement, the addition of evaluative information, and other features.   I’ll be sure to post additional information as I work on the database over the upcoming weeks, in the meantime I’d be interested in any feedback as to whether such a database might be useful and whether my proposed model makes sense.

Download the entity relationship model/database diagram (pdf)