As I’ve noted in the past, it is vitally important that file formats be correctly identified if we wish to preserve them for future preservation and use.  The best tool to do this, DROID, relies on the PRONOM database.

For that reason, I was pleased to see that the National Archives (UK) just announced a major expansion of the database. This work should be immediately available to anyone using DROID, and hopefully the results of this work are able to be made available in other tools, such as the Duke Data Accessioner and Archivematica, which use DROID—when the signature files are updated.

I also noted that the UDRF project, which seeks to provide a long-term replacement to PRONOM and the GDFR project, has now received funding from the NDIIP, and is moving into an active development phase.  Details are a bit sparse, but I hope to hear more at the 6th International Digital Curation Conference, where I’ll be presenting later this week.  It looks like may be a live feed of the conference (or at least some interviews, tweets, etc) at an idcc Netvibes site.

Duke DataAccessioner: Review

October 26, 2010

As Chris has noted previously, it is important for archives at ‘small’ repositories to rapidly complete basic archival tasks, like bulk file identification, transfer, and processing.  For a long time, he had been meaning to test out the Duke Data Accessioner.  Last week, he turned the task over to me, as part of a project to process the Ed Kieser Papers.

The “Duke DataAccessioner” is a free, open source program that can be downloaded to your desktop and used to migrate data from physical media or directories and into a dedicated file server/directory structure for preservation, further appraisal, arrangement, and description;  it also provides a way to integrate metadata tools at the time of migration.

Using DROID for Appraisal

February 17, 2010

DROID, developed by the UK National Archives,  is a tool that can also assist archivists in identifying file formats.  It is sometimes used as part of processes to preserve electronic records.  The FITS tools, for example, make use of it to extract information concerning the identity of the file type, and the proof of concept version of Archivematica stores some of the information that DROID extracts in the archival information packet that it generates.

However, I think it may be equally valuable as part of an appraisal process, when an archivist is trying to understand the components of a particular series of records.

DROID reads internal header information from one or more files then uses a sophisticated algorithm to compares that information to signature files stored in the PRONOM database.  Based on the comparison, DROID declares whether a match is ‘positive,’ tentative’ or ‘unidentified’.   For each positive or tentative match, DROID provides the Pronom Unique ID (PUID), MIME type, format, and version.  The exact process that the software uses is described in the technical manuals for the system, but obviously the success of the process depends largely on the completeness of the database/signature file to which DROID refers.

The tool is very helpful, but I don’t think many people outside of large scale digital preservation projects are actually using it, since it is somewhat of a power tool and since its main purpose is to support preservation of digital objects in a repository.  You can download versions of it for all major platforms from Sourceforge; the stats provided seem to indicate that it has been downloaded around 8,000 times (version 4.0 1,600 times).

Aside from its use for digital preservation, it can also be used when assessing files for potential accession.  In the future, DROID (or an application like it) could be even more useful. When UDFR proposal and resources such as the PLANETS Core Registry (PCR) come to fruition, particular file formats could be linked t lists of software that can render and/or undertake preservation actions for particular file types.  The PLANETS tools, such as PLATO and the Testbed,, when they are released in May, may include some of this expanded functionality.

In any case, my full ‘evaluation’ of DROID, which I used to ID my test records, is after the break.

