The past few weeks have been dedicated to an analysis of the ALA website, creating a spreadsheet to survey their electronic records. In order to gain a better understanding of staff, technology, IT support, and records organization, Liz and I created two documents specifically tailored to surveying the ALA: a brief web survey for various ALA offices and an in-depth on-site interview form. Although we originally contemplated open-source survey software, such as LimeSurvey, we eventually settled on a commonly known application.
Over the past day, I spent a good deal of time investigating the SABA-Copying program, developed by the Danish State Archives, and which I briefly mentioned here.
In short, I find the program really, really useful, even with its Danish interface. Since this program is not at all known outside of Denmark, I’d like to to use this post to:
- describe what it does and how to use it
- explain why I like it
- list some issues that would need to be addressed if it were to be used outside of Denmark.
When I was working with file managers to conduct some appraisal on the records in my test set–see my postings last week–there were several instances in which it seemed likely to me that more specialized tools would help me identify groups of files for deletion, migration, or other processing actions. I needed to study the structure of a complex folder, identify duplicate files, move files based on filter criteria, and rename files, for instance. Even the most complex generic file management program cannot be used to complete those operations.1
Many of these tricks can be accomplished using command line functions in Linux or other operating system. But, realistically, most archivists will have access to a Windows computer, so I started looking around for free or low cost programs that run on Windows, After the break, you can read about three tools I found particularly useful: Tree Size, Duplicate Cleaner and Renamer. Even though their names make them sound prosaic at best, they really are worth knowing about.
[updated April 22, 2010: added detailed ratings]
After my experience with Mac Finder and Pathfinder, I spent some time today testing Windows file utilities to appraise records. In general, I did not find them to be quite as useful as Pathfinder, (although I have not used the version of Windows Explorer that is included in Windows 7). Nevertheless, you may find some of the following tools helpful when attempting to weed or reorganize complex sets of electronic records. Any of these applications are useful to have around, since they eliminate most of the major problems with windows explorer (such as the infamous failure to complete a copy operation if one file fails due to a too-long path name.)
Over the past day, I continued to work with my files from the OIF, FTRT, and Merritt Fund. Droid provided me a better understanding of issues I was likely to confront in identifying and migrating important content, so I turned my attention to examining files quickly and efficiently so that I could make appraisal decisions about them. As I took actions, I recorded them in a file (appraisal actions.txt), in the root of the files, for potential inclusion in the AIP or a descriptive system, at a later time.
Every operating system, of course, includes a built in file manager, such as Windows Explorer, the Mac Finder, or Gnome Nautalis. In addition, each operating system may have one of more paid or free file managers which can be used to replace or supplement the default application. For example, Pathfinder is a well-reviewed and powerful file browser for the Mac.
My impressions/evaluation of working with both the MAC Finder and Pathfinder are after the break. Tomorrow, I’ll review some Windows applications.
Over the past day, I have been testing tools for appraisal, using records from the American Library Association Office of Intellectual Freedom (OIF) the Freedom to Read Foundation (FTRF), and the Leroy J. Merritt Humanitarian Fund. The files are particularly appropriate for this purpose since they represent the completing functioning of related groups within a larger organization, since no prior appraisal has been conducted on the files, and since the files are likely to have continuing value to the organization, as well as future research value for students, scholars, and members of the public.
Under a research/nondisclosure agreement, I was supplied a snapshot of a office’s working files on July 28, 2009. Although the files were given to me for research purposes only, it is possible that the Office of Intellectual Freedom will decide to include some of the files in the American Library Association Archives, at the end of the research project.
The files comprise a complete electronic record of the office since the time that office began storing files on a shared server. The folders use a deep file structure and include a wide range of file formats. In addition, some of the materials are sensitive and will need to either be removed from the archives or placed under a restriction policy. (This is particularly the case for Merritt Fund materials, which include case files.) For this reason, it is important that potentially private materials be identified and then segregated and removed from materials to be deposited, or placed under appropriate restriction policies, in agreement with the creating office.
Obviously, one needs a semi automated way to identify potential files for inclusion. Such work could be completed either by an archivist or a records creator, but tools are needed to sort through these materials. As a result, I tested several approaches.
DROID, developed by the UK National Archives, is a tool that can also assist archivists in identifying file formats. It is sometimes used as part of processes to preserve electronic records. The FITS tools, for example, make use of it to extract information concerning the identity of the file type, and the proof of concept version of Archivematica stores some of the information that DROID extracts in the archival information packet that it generates.
However, I think it may be equally valuable as part of an appraisal process, when an archivist is trying to understand the components of a particular series of records.
DROID reads internal header information from one or more files then uses a sophisticated algorithm to compares that information to signature files stored in the PRONOM database. Based on the comparison, DROID declares whether a match is ‘positive,’ tentative’ or ‘unidentified’. For each positive or tentative match, DROID provides the Pronom Unique ID (PUID), MIME type, format, and version. The exact process that the software uses is described in the technical manuals for the system, but obviously the success of the process depends largely on the completeness of the database/signature file to which DROID refers.
The tool is very helpful, but I don’t think many people outside of large scale digital preservation projects are actually using it, since it is somewhat of a power tool and since its main purpose is to support preservation of digital objects in a repository. You can download versions of it for all major platforms from Sourceforge; the stats provided seem to indicate that it has been downloaded around 8,000 times (version 4.0 1,600 times).
Aside from its use for digital preservation, it can also be used when assessing files for potential accession. In the future, DROID (or an application like it) could be even more useful. When UDFR proposal and resources such as the PLANETS Core Registry (PCR) come to fruition, particular file formats could be linked t lists of software that can render and/or undertake preservation actions for particular file types. The PLANETS tools, such as PLATO and the Testbed,, when they are released in May, may include some of this expanded functionality.
In any case, my full ‘evaluation’ of DROID, which I used to ID my test records, is after the break.