Over the past day, I have been testing tools for appraisal, using records from the American Library Association Office of Intellectual Freedom (OIF) the Freedom to Read Foundation (FTRF), and the Leroy J. Merritt Humanitarian Fund.   The files are particularly appropriate for this purpose since they represent the completing functioning of related groups within a larger organization, since no prior appraisal has been conducted on the files, and since the files are likely to have continuing value to the organization, as well as future research value for students, scholars, and members of the public.

Under a research/nondisclosure agreement, I was supplied a snapshot of a office’s working files on July 28, 2009.  Although the files were given to me for research purposes only, it is possible that the Office of Intellectual Freedom will decide to include some of the files  in the American Library Association Archives, at the end of the research project.

The files comprise a complete electronic record of the office since the time that office began storing files on a shared server.  The folders use a deep file structure and include a wide range of file formats.  In addition, some of the materials are sensitive and will need to either be removed from the archives or placed under a restriction policy. (This is particularly the case for Merritt Fund materials, which include case files.)  For this reason, it is important that potentially private materials be identified and then segregated and removed from materials to be deposited, or placed under appropriate restriction policies, in agreement with the creating office.

Obviously, one needs a semi automated way to identify potential files for inclusion.  Such work could be completed either by an archivist or a records creator, but tools are needed to sort through these materials.  As a result, I tested several approaches.

The information supplied by standard file managers under the ‘properties’ or ‘get info’ menu is only moderately helpful.  According to the Mac Finder application, the test records comprise 25,619,159,326 bytes  in 34,128 items.  Windows Explorer 34,144 items (31,972 files and 2,172 subfolders) and Nautalis 34,083 items.

As I noted in my previous post, I had limited luck using standard file mangers to appraise records or pare them down to a more managable list by weeding duplicates, etc.  This motivated me to take another look at using DROID for appraisal.

The information that DROID supplies is more useful, although as I previously noted, the output not optimally organized for reuse.  140 unique formats (corresponding to a Pronom Unique ID) were located, although many of these were simply versions of a single master format, such as jpeg or Microsoft Word document.  Nevertheless, the information provided on the DROID summary tab was useful since it showed that the vast majority of files were relatively simple to preserve and migrate, such as Microsoft Word 97-2003 documents.

By regularizing the DROID CSV output, I was able to make the information it provided sortable and therefore much more useful.  I opened the file in a text editor and did a global find a replace to put all of the output for a single file on one line (by removing the string for end of line”[line break],,”).  Once this was completed, the I used Microsoft Excel to sort the output by the primary PUID, then by file name.  The entire operation took about 15 minutes and when I was done, I was in a much better position to appraise the records and to use advanced file management tools to manipulate them and prepare them for submission and/or migration.

DROID was also useful in identifying files that did not use the standard file extension for an application, such as WordPerfect Files created in the mid to late 1990s, and in helping me to locate the specific directories in which they were found.  DROID was also useful in identifying files that did not use the standard file extension for an application, such as WordPerfect Files created in the mid to late 1990s, or in identifying backup file extensions.  In addition, DROID helped me identify and locate files that were particularly problematic or likely to need additional attention, such as a sets of audacity files (Audacity splits a single recording into multiple files), which are linked via a .aup (Audacity Project File), so that these files could be converted to a non-proprietary format.    Similarly, DROID helped me locate text files saved with non-standard extensions, which many computers would attempt to open in other applications, using the file extension associations.  In some cases, I was able to quickly determine that the text files replicated information in more complex formats, such as Word Perfect.

Finally, DROID also proved very useful at later points in the appraisal process, such as when I was using a file manager to browse folders and files.  If I came across an extension that my computer could not open, I often browsed the DROID output in order to get some information regarding the likely file type, at which point I was often able to find an application to view or open the file.  For example, I located database files created with Paradox, and was able to examine them with the Paradox Viewer application, quickly determining that they were contacts lists that should likely be excluded from any publicly-available records.

In the course of using DROID, I also noticed several instances in which the application misidentified files.  Word Perfect Macro files were identified as Word Documents and many Excel and Word Documents created after 2008 were identified as OLE2 compound objects (PUID).

In short, after using the sorted list for an hour or so, I was able to identify the major migration issues I was likely to face with the files, as well as to gather some preliminary information that would help me in weeding out inappropriate, duplicate, or private content.

[updated with detailed score: April 22, 2010]

DROID: ‘Score’ for a Small Archives

  • Installation/configuration/supported platforms: 20/20
  • Functionality/Reliability: 15/20 Good at what it does, but would be more useful if intergated into an overall ‘ideal appraisal tool’
  • Usability: 8/10 would be useful if the output for each file were all on one line
  • Scalability: 5/10 works very quickly through many files
  • Documentation: 10/10
  • Interoperability/Metadata support:8/10
  • Flexibility/Customizability: 8/10
  • License/Support/Sustainability/Community: 10/10 looks like it will be basis for GDFR and Planets work.

Final: 84/100

Tagged with: