One of the most neglected aspects of work regarding electronic records and born-digital materials is the appraisal process and specifically the acquisition of material from private parties. Several projects at high profile institutions are examining or have examined issues related to personal digital archives: most prominently the British Library’s ongoing Digital Lives Project, Stanford’s SALT toolkit, Oxford’s Future Arch project, and the AIMS Project. In addition, we should all check out the work by Cathy Marshall and Jeff Ubios, who are working outside the archival community.
As interesting and valuable as this research is, let’s face facts: many donors are reluctant to turn digital stuff over to an archives.
Continue reading »
Continue reading »
As I noted in a prior post, I really feel as if no one is making a case to the public as to why email should be preserved for its permanent research value. Until we do that, the archival profession is unlikely to make much progress in preserving it for future research value. What might such a case look like?
I’ve been making my way through a growing bibliography of articles, blog postings, technical standards, and product literature concerning email preservation–most of which is convincing me that there are several viable technical solutions on offer, but that the we (i.e. the archival profession) have a lot of work to do to integrate them into a coherent strategy. More to the point, that we (archivists) have done a horrible job of making the case as to why email should be preserved (if only we could establish a meme was well as some of our so-called political ‘leaders’).
As I’ve been working to pare my sets of test records down to a management set of records that might comprise the records to be included in a submission information packet, I’ve more or less come to conclude that what archivists really need is a purpose-built tool for conducing records appraisal. Sure, it is possible to cobble together an approach to records appraisal using a variety of open source or paid tools, but none of the tools really allow you to go through and identify records quickly and easily.
What would an ideal e-records appraisal tool look like?
Before we get to that, allow me a short digression:
After my experience with Mac Finder and Pathfinder, I spent some time today testing Windows file utilities to appraise records. In general, I did not find them to be quite as useful as Pathfinder, (although I have not used the version of Windows Explorer that is included in Windows 7). Nevertheless, you may find some of the following tools helpful when attempting to weed or reorganize complex sets of electronic records. Any of these applications are useful to have around, since they eliminate most of the major problems with windows explorer (such as the infamous failure to complete a copy operation if one file fails due to a too-long path name.)
Over the past day, I have been testing tools for appraisal, using records from the American Library Association Office of Intellectual Freedom (OIF) the Freedom to Read Foundation (FTRF), and the Leroy J. Merritt Humanitarian Fund. The files are particularly appropriate for this purpose since they represent the completing functioning of related groups within a larger organization, since no prior appraisal has been conducted on the files, and since the files are likely to have continuing value to the organization, as well as future research value for students, scholars, and members of the public.
Under a research/nondisclosure agreement, I was supplied a snapshot of a office’s working files on July 28, 2009. Although the files were given to me for research purposes only, it is possible that the Office of Intellectual Freedom will decide to include some of the files in the American Library Association Archives, at the end of the research project.
The files comprise a complete electronic record of the office since the time that office began storing files on a shared server. The folders use a deep file structure and include a wide range of file formats. In addition, some of the materials are sensitive and will need to either be removed from the archives or placed under a restriction policy. (This is particularly the case for Merritt Fund materials, which include case files.) For this reason, it is important that potentially private materials be identified and then segregated and removed from materials to be deposited, or placed under appropriate restriction policies, in agreement with the creating office.
Obviously, one needs a semi automated way to identify potential files for inclusion. Such work could be completed either by an archivist or a records creator, but tools are needed to sort through these materials. As a result, I tested several approaches.
For the past week, I was in Copenhagen, hosted by Finn Aaserud at the Niels Bohr Archive, located at the Niels Bohr Institute. I gave Finn and his staff a hand with his installation of Archon and also spoke at the Bohr Institute’s history of science seminar.
In conjunction with my visit, Finn arranged for me to meet some of his archival colleagues from the Danish State Archives, the Aalborg Archives, and the Byhistorisk Samling og Arkiv Blaakildegaard, a historical museum and archives in the town of Taastrup, west of Copenhagen. In addition, several archivists from the Danish State Library attended my talk (“Preserving the ‘Papers’ of 21st Century Science”).
I learned quite a bit about digital preservation and archives work in Denmark, particularly at the state archives. They have developed a particularly interesting approach to digital preservation.
DROID, developed by the UK National Archives, is a tool that can also assist archivists in identifying file formats. It is sometimes used as part of processes to preserve electronic records. The FITS tools, for example, make use of it to extract information concerning the identity of the file type, and the proof of concept version of Archivematica stores some of the information that DROID extracts in the archival information packet that it generates.
However, I think it may be equally valuable as part of an appraisal process, when an archivist is trying to understand the components of a particular series of records.
DROID reads internal header information from one or more files then uses a sophisticated algorithm to compares that information to signature files stored in the PRONOM database. Based on the comparison, DROID declares whether a match is ‘positive,’ tentative’ or ‘unidentified’. For each positive or tentative match, DROID provides the Pronom Unique ID (PUID), MIME type, format, and version. The exact process that the software uses is described in the technical manuals for the system, but obviously the success of the process depends largely on the completeness of the database/signature file to which DROID refers.
The tool is very helpful, but I don’t think many people outside of large scale digital preservation projects are actually using it, since it is somewhat of a power tool and since its main purpose is to support preservation of digital objects in a repository. You can download versions of it for all major platforms from Sourceforge; the stats provided seem to indicate that it has been downloaded around 8,000 times (version 4.0 1,600 times).
Aside from its use for digital preservation, it can also be used when assessing files for potential accession. In the future, DROID (or an application like it) could be even more useful. When UDFR proposal and resources such as the PLANETS Core Registry (PCR) come to fruition, particular file formats could be linked t lists of software that can render and/or undertake preservation actions for particular file types. The PLANETS tools, such as PLATO and the Testbed,, when they are released in May, may include some of this expanded functionality.
In any case, my full ‘evaluation’ of DROID, which I used to ID my test records, is after the break.
After I spoke at the Society of Archivists’ Data Standards Group, a member of the audience asked if I have been working to evaluate sofware suitable for appraising records, i.e. helping archivists or producers select records for deposit into a trusted digital repository. At the time I responded (somewhat off the cuff) that I had found particular file managers, renames, and bulk deletion programs to be useful, but that I hadn’t really considered the question all that much.
But as I reflected on it later, the question seemed to grow more complex. Most, if not all, of the development work concerning digital repositories focuses on meeting the requirements of the OAIS reference model. However, the reference model itself has nothing to say about how records should be selected for deposit. On one hand, this makes sense, since each archives has a different focus. But appraisal (i.e. the selection of records for inclusion in an archives) has always been the most debated (or at least most heavily written about) archival topics.