Migration Pathways Table
One of the most important decisions when starting an electronic records program is deciding on the list of file formats which your repository will regard as ‘supported’ for preservation and current access purposes. It is an inherently complex decision. Here is a table to get you started, it may for the basis for a local preservation and access plan and should be used in conduction with a deposit policy and transfer guidelines that you can give to records producers.
|Genre||Supported Formats||Preservation formats||Supported Access Format||Supported Access Tool||Migration Path/Notes|
|1. Some significant properties may be lost on migration, particularly if Open Office Writer is used for migration . Link to a GNU converter for WordStar to RTF. A copy is also mirrored on my site.
2. Further investigation is necessary to see if this process can be automated.
3. there too many rendering issues significant issues with PowerPoint presentations converted to .odp format, so producers should be encouraged to produce pdf versions for deposit.
|Text, source code||. txt or files of any extension or no extension containing ASCII or MIME data||Original format||.txt||NotePad++ or jEdit||To facilitate rendering use Renamer or Thunar to append .txt, without removing original extension.|
|Word Processed Documents||.doc, .wpd, .odt,||.odt||PDF/A||Adobe Reader, Open Office||Use Adobe Acrobat or Open Office to Migrate doc, wpd and odt to PDF/A, where appropriate.1|
|Raster Images||.jpg, .png||.tif||.jpg||GIMP||Use ImageMagick or Adobe to migrate.|
|Vector Images||.ai||.sgv||.sgv||Inkscape, any modern browser will open .sgv files, but inkscape imports .ai files .||Convert to sgv by importing to Inkscape.2|
|Audio||.mp3,||pcm stored as wav file||.mp3||VLC Media Player||ffmpegor Audacity|
|Video||avi, mov, wmv||mpeg2||.ogv||VLC Media Player||ffmpeg|
|Email accounts||none||Mbox (embedded attachments)||Mbox||Thunderbird 3 (place in local folders to import)||Use Aid4Mail to migrate to mbox; can not guarantee accessibility of attachments that are not a preferred file type|
|Databases||.mdb||.siard, mysql||Original format||SIARD newtorked mysql server or local xampp||Use SiardEdit to migrate content, cannot guarantee database look and feel or access to supporting scripts, applications.|
|Spreadsheets||.xls||.ods||Ods||Open Office Calc||Open Office; cannot guarantee access to all significant properties of migrated files|
|Presentations||Pdf, odp||PDF, odp||PDF, odp||Adobe Reader||Adobe Acrobat3|
|Application files/Executables (e.g any compiled format intended to be run as a program)||None||Original format||Original format||None|
I provide this table with a number of provisos.
First, the migration pathways column lists tools that can be used to migrate these file types. However, it should be noted carefully that most of these tools are difficult to using as stand-alone, native applications. In many cases, a repository will be better off using an general-purpose migration tool, such as Xena, or an all-in-one digital preservation tool, such as RODA or Archivematica, which bundles several of these tools together.
Second, it simply represent my own opinion regarding a reasonable strategy that most small archives can put in place, given limits on avialable resources. The only currently available tool that is intended to provide comprehensive information about file formats is the PRONOM file registry. However, it contains only skeletal information for most formats. In the absence of hard, experimental data (which may or may not emerge from those using the Planets testbed, we are left to do the best we can with the information at hand,. I’ve tried to abstract that information into the table above, based on my experiences working with the OIF and Lauterbur files. It is simply my attempt to boil other people’s good work down to a digestable format.
Third (and in respect to my second proviso) there are much better and more complete resources available. I found two resources in particular very helpful when constructing my own recommendations in the table above, but there are many other such resources avaialable.
- UK National Archives list of considerations you should take into account when deciding which formats to support.
- Archivematica’s Media Type Preservation Plans (which are currently a work in progress and subject to change.)
Fourth, I’ve tried to recommend some access tools that will provide good access not only to the preservation formats that I am recommending, but also to many other formats as well, and I’ve concentrated on open source or freely available tools. Once the existence of these tools is factored in, there would appear to be much less need to undertake migration for a majority of formats, since a repository can either install these tools locally or provide links so that end users can install them to view access copies of the files.4
4. Many authors have noted that digital formats do not go obsolete quickly, and any decision to migrate a format carries inherent risk of degradation or loss. Furthermore, migration is a time and resource intensive process. It may involve quite a bit of planning, testing, and software selection. Files may need to be segregated from those of a different format–a particularly difficult problem when a group of heterogeneous files are spread through many folders or subfolders. Ideally, one would migrate to file formats that are a) widespread, 2) open, and 3) well documented, and 4) well supported by freely available or low costs software and 5) well documented and 6) supported by conversion tools. However, these criteria may conflict with each other.
In addition, one must have in place tools which can migrate easily from the source to the target format. Once a decision to migrate is undertaken, conversion software must be selected. Quite possibly, custom scripts or conversion routines will need to be written, and the files that are converted will need to be quality checked. In addition, external files (such as email messages) that reference a converted file will need to be updated to reflect the new file name.
Finally, software may be readily available to read proprietary formats, so there may be no immediate need to migration. Migration pathways may be unclear. For these reasons, your repository should think very carefully about preserving files in their proprietary formats, for a short term, unless software is not readily available to interpret them. For these reasons, authors such as Adrian Brown have argued that the community should develop and support file format registries which include lists of tools and migration pathways. However, such resources can only be based on the hard experience of testing.