For the past week, I was in Copenhagen, hosted by Finn Aaserud at the Niels Bohr Archive, located at the Niels Bohr Institute. I gave Finn and his staff a hand with his installation of Archon and also spoke at the Bohr Institute’s history of science seminar.
In conjunction with my visit, Finn arranged for me to meet some of his archival colleagues from the Danish State Archives, the Aalborg Archives, and the Byhistorisk Samling og Arkiv Blaakildegaard, a historical museum and archives in the town of Taastrup, west of Copenhagen. In addition, several archivists from the Danish State Library attended my talk (“Preserving the ‘Papers’ of 21st Century Science”).
I learned quite a bit about digital preservation and archives work in Denmark, particularly at the state archives. They have developed a particularly interesting approach to digital preservation.
As part of my visit, Peter Edelholt demonstrated four pieces of software that the state archives is developing as part of its overall program for the preservation of business records and private papers. The approach that the Danish State Archives is taking toward digital preservation is unique, so far as I am aware, since all of the materials to be deposited by individuals (with the exception of audio and video) will be converted to tiff format for inclusion in the archives.
The program for private papers and business records is based on a established method that they developed over the past ten years for preserving government records. The basis of which is described in the proceedings of a Symposium which they hosted in 2008. The Danish legal context for archives is quite unique, in that government bodies are legally mandated to supply digital records in a format mandated by the archives. Documents, whatever their format, are converted to tiff format and supplied with a series of tabular and xml files providing metadata, which can be read by a database application that the archives developed. All materials, including the tiff files, text files for searching, and the tab files/metadata are stored within one folder and are deposited as an archival information packet, into a repository system based on redundant offline storage (2 DVDs in separate physical locations and backup magnetic tapes). All of the media are on a continuous, monitored refreshment schedule.
The state archives chose tiff as a preservation format since they feel it has the most long-term stability and least likelihood of changes. Compliance by state agencies is required under law, and documents must be supplied in tiff format. The existence of such a requirement has provided a certain amount of stability so that a market for conversion tools and services has developed. Peter said most state agencies now understand the requirements, and budget for them.
The Danish State Archives is beginning to develop a programmatic basis for the deposit of electronic ‘papers’ from individuals, businesses, and associations. These files (with the exception of audio, video, and other special formats) will also be converted to tiff for permanent retention. Obviously, the archives cannot require that individuals deposit materials in this particular format, so the will have to be converted by the archives
To help people deposit their materials, they are in the process of developing four software tools. The tools will be used by the state archives and will also be provided to other Danish archives that are provided legal authority to accept electronic papers. (I am not sure I understood Peter correctly, but he seemed to suggest the Danish law mandates that only certain archives, including the Danish State Archives and some local authority archives, can accept electronic materials from private donors.) The are developing an appraisal tool , a file conversion tool, a validation tool , and an access tool. All of the tools are written in C#/.Net framework, so they will work only with files that are accessible via a computer running Microsoft Windows.
The appraisal tool is quite interesting, and I think it could be very useful outside of the Denmark. It provides method for donors or archivists to browse, select, and copy files for deposit. That might not sound all that innovative, since someone could use a file browser to do the same thing. However, the tool is very simple to use (even with a Danish language interface), and I think it would help donors take an active, engaged role in determining the shape of their archives. It also includes a simple method for supplying basic descriptive information regarding both the materials being transferred and the archival creator. In addition, files that were not selected by the user are also listed, in effect providing a list of files that were ‘weeded’ by the user. Information about weeded files can be suppressed from the dissemination packet that is provided to end users. All of this information is saved by the appraisal tool to a USB key or other location as a submission information packet.
Peter has sent me the appraisal tool, and as I work with it more, I may provide some more information about it.
The conversion tool is a bit more complex and is the final stages of development. Although it is mainly intended for archivists, it is possible that experienced computer users could make use of it. Ideally, it would be run on the computer the created the files, since every file to be deposited (which can be rendered graphically without animation or sound) is converted to tiff format. The archives also plan to use in on some dedicated servers in their Copenhagen location.
The conversion process uses a proprietary application that installs as a standard printer driver. The license allows the archives to integrate the driver directly into the tool itself, so users will not need to pay a separate license fee. The conversion application reads file extensions (defined in an editable .ini file) to open the application that is associated with the extension. It then prints the output to a tiff file. If necessary, complex formats such as email are unpacked by the tool prior to conversion.
Each tiff file is provided a numeric identifier, and a series of .tab (text) files are generated, along with an xml file, to provide structural metadata and to record the object’s original file name. If a file cannot be converted, the user can try again (sometimes conversions fail for unexplained reasons, probably due to resource conflicts in the printer queue). If necessary, the user can also attempt to correct problems with the file extensions or choose to exclude certain files from the conversion process. If the user chooses to exclude files, they will be listed as excluded in the ‘archival information packet’ that the application produces. Since the creating or viewing application must be opened, the conversion process can be quite lengthy if there are many files to manipulate.
For each file that is converted, a text file is made. In addition, certain metadata fields are extracted from particular document types, such as email, so that the end user can search on access points such as the to, subject, and cc: fields. In addition, the full structure of the documents, as they existed on the user’s hard drive, ar retained. Simiarly, the tab files (and its xml ‘key’ provide structural metadata to allow for the reassembly of multi-page documents or complex objects, such as email with embedded attachments. When all conversions have been accomplished, the converted files and metadata are copied to another location for validation.
The validation tool is used ensure that the files to be deposited are in the correct formats and that adequate metadata is supplied, according to the standard which the state archives defined. The testing tool includes several options to examine files and metadata, so a visual verification of conversion may be performed, at least on a sample of items (there is no automated method to ensure that the print driver did not garble the output.) Once the testing has been completed, the file are written to a storage media and placed into secure storage locations..
The Archives provides access copies of the DVDs, along with a copy of the access software, for use either in the search room on on user’s home computers.
Peter did not demonstrate the access tool to me, and I believe it is still under development. In any case, Peter said that all of the tools will be complete by Mor or June, at which point they will be distributed to institutions in Denmark and other interested individuals.
In choosing tiff as its preservation format, the Danish State archives has choosen an approach that is quite different from other repositories. But I think it is important to note that the program has been very successful for government records, and the archives will be building on its strengths. Obviously, the decision to take only tiff files leads to a potential loss of functionality, but to a certain extent, this has been accommodate for in the access tool which the archives is developing.
The archives is also developing conversion tools for other formats such as audio and video, but we did not have time to discuss these.
All in all, I was very impressed with the project, since it has a clear vision, achievable objectives, and a long-term likelihood for sustainability. The solution works within the confines of the resources the repository has available, and leaves the Danish state archives with a clear migration and maintenance path for the future, since only one format is being used for the representation of static documents, photographs, and other items that can be transferred. It remains to be seen how widely used the method will be in the partner institutions, but it is definitely worth following. Above all else, it shows that there are many potential approaches to digital preservation, and that whatever system one chooses, it will be successful to the extent that it is attuned to local needs and available resources and expertise.