Trustworthy Digital Objects

On December 21, 2009, in Methods, Software Reviews, by Chris Prom

In case you haven’t read them yet, I’d recommend taking a look at Henry Gladney’s article “Long-Term Preservation of Digital Records: Trustworthy Digital Objects,” which was published  in the most recent issue of the American Archivist.  I had read a manuscript version of it on Gladney’s website, but held off on commenting since it was not yet published when I read it.

Its a good thing I held off, because, truth be told, I had to read it three times to make sense of it.  The piece is very densely written and the thesis is provocative because he proposes a unique solution to the problem of digital preservation: creating trustworthy digital objects (TDOs) ,which ‘self-contain’ all of the information needed to verify their authenticity, completeness, and provenance.  They essentially consist of:

  • the original bit strings that that represent an object of preservation.
  • sufficient representation information to render and understand the records
  • a bundled version of software to render and display them (or a reference to an external entity that can do so)
  • enough contextual information to tell what the records are and how they relate to other records
  • fixity information to verify that the record are authentic, have not been modified or corrupted.

The interesting and provocative part of this is that Gladney argues that it would be a ‘simple matter of programming’ to extend current content management systems to manage and provide access to trustworthy digital objects, by adding in a TDO editor emulators based on the Universal Virtual Computer concept.  Thus, archivist would  not be required to migrate records (to account for potential format obsolescence), although it would certainly help matters if we settled on a few non-propriatary formats for a base set of files to manage.

To lay the groundwork for this argument, Gladney first describes the status of current work the implement parts of the OAIS reference model, which he argues have been implemented in a way that will provide only near term (i.e. less than 50 years) preservation.  He argues that it would be easy to assemble a trustworthy digital repository from current systems (either open source or commercial), and that such systems should be adapted to local needs.

In the key passage of his article (pages 415-422 ) he describes the TDO architecture and structure.

“The idea is to package source information collections so that:

  • The bit-string set that represents a work is XML-packaged with registered schema.
  • Each bit-string that represents part of the work is encoded in a computing platform-independent representation or is accompanied by a bit-string encoded for everlasting intelligibility.
  • Integrity is assured by cryptographic message authentication.
  • The package includes provenance evidence, technical metadata, and one or more identifiers of the object itself.
  • Links to contextual information are secured by cryptographic message authentication codes of the linked entities.
  • Information loss is minimized by replication in mutually independent repositories.
  • Cryptographic signatures are grounded in keys that widely-trusted institutions publish periodically.”

The discussion is very complex, and most archivists (myself included) will have trouble understanding the technical details.  For example, he lays out a database structure, using a ‘manifest’ block to describe all of the essential elements of each TDO, and a ‘relations’ block to describe its relations to other objects, using a triple-store table to specific the object, subject and nature of the relationship.  Using his method, he argues, computer systems can guarantee the trustworthiness of digital objects, in such a way that users can verify their authenticity.

It is an interesting thesis, but at heart it would require a significant ‘about face’ in how current preservation projects are operating–if indeed it is practical.  Near the end of the article, Gladney notes that simple information formats can be easily handled in the system, where there are accepted and/or open formats that can be easily rendered (he cites PDF and JPG) but “[f]or more complex digital objects, we create emulator programs that accompany today’s content to render it for our descendants.” (page 424).    In the rest of the article, he describes the theory of virtual machines, in particular the Church-Turing Thesis and the Raymond Lorie’s “Universal Virtual Computer.”

However, using this strategy to preserve objects would require that a Turing machine program (UVC file) be written and tested for every file format we wish to preserve.   Then, a copy of the UVC would need to be attached to the digital object (or referenced via the relation table).  Each UVC file, in return, would be linked to an emulator for the machine type on which it ran (Intel x86, etc).  In the future, the files could be restored and delivered to the user by a restore application which uses the original bit string, the UVC file for the program that ran it, and the hardware emulator.

The key role for archivists (today) would lie in using a TDO editor (basically, a special type of xml editor that would need to be written), to package up the essential information.

It all sounds very logical, and Gladney may indeed be write that such a system could work.  I personally lack the ability to judge whether the UVC idea is possible, although it certainly sound fantastic.  Certainly, from a human point of view, I find it difficult to believe that there would be enough funding or technical expertise to write emulators for every file type that, theoretically, and archives could want to preserve.1

In the end, I think it is well worth reading the article, because even if the ideas sound wildly impractical, the very act of grappling with it forced me to learn a lot   about current standards for managing data. As I’ve been exaiming TDR like systems (such as RODA, Islandora, and Archivmatica), it has given me a better theoretical and technical understanding of how they work.

Finally, I think it would be very interesting to see whether or not even a portion of what he suggests is possible.  I am reminded of my Illinois colleague Bill Mischo’s comment that “Sometimes we need to have the ability to fail.”  Would anyone be so daring as to try to implement a TDO system based on the UVC?  Would any funding agency go near it?

1. for an interesting discussion of the problems related to migrating (much less emulating) non-standard formats, see Gap Analysis: a Survey of Preservation Action Tool Provision, a PLANETS project report by the Dutch National Library. The format type used as a case study in the analysis is Sheet Music, a format dominated by two competing proprietary formats.

Tagged with:  

Comments are closed.