Last week, I reviewed the Planets PLATO Preservation Planning tool. The Testbed is another Planets web-service that can be used in planning preservation services/actions. Its purpose is to allow users to locate, select, and test services that can be used to undertake preservation actions, such as identification, characterization, and migration. It is a part of the Planets Interoperatiblity Framework, and should be available for download and local installation after the end of the Planets project, May 30th. In the meantime, users can register for an account on the public site.
The testbed includes areas to browse services, browse previous experiments, and conduct new experiments.
The public testbed includes a method to browse services that can be used to undertake preservation actions; 45 such services are currently installed. The services listed include many of the identification, characterization, and migration tools that I have been assessing, such as JHOVE, DROID, New Zealand Metadata Extractor, etc. All of the tools are open source included on the public website are open source, due to licensing restrictions. While the list of included tools is impressive, it is far from complete. For example, neither Open Office nor ffmpeg have been included.
PLANETS has released a detailed specificiation regarding how additional services can be ‘wrapped’ (e.g. installed into the testbed), so that they too can be used to conduct experiments. A developer can use the information in this document to develop a java wrapper for the service, so that it can be used for experimental purposes. I do not think that most, if any archivists, would be interested in/capable of using this document to add services to the Testbed or to other planets services (for example, in order to implement a preservation workflow within a production system. The process is very complex and the document is targeted at ‘Planets Experts’.
In any case, viewing the detailed information about a service provide a variety of useful information, most notably the formats (listed by PUID–PRONOM Unique Identifier) that it can accept as input format, as well as the results of any previous experiments that have been run.
While the ability to browse the services is very useful, the list of services is very imcomplete; therefore, any archivist looking for a tool to convert records would also be required to use other resources to identify potential tools to use. One solution to this problem would be to directly browse the Planets Service Registry, but it does not seem to be accessible at any public location. Perhaps the work to develop the UDFR will lead to robust method to browse format, associated software, and service tools, but very little information about the project is available, except in a restricted wiki.
For the time being, a user of the testbed can browse by file format to see which tools take certain formats as input or provide them as output (again, noting that the list of tools is incomplete). The list of formats/service included can be filtered using checkboes. Most of the images formats have several tools listed, but other formats are less well covered. For example, the testbed lists only JHOVE as a tool that can be used with Audio Interchange File Format; for WordPerfect Files, only New Zealand Metadata Extractor Service and DIaLOGIKa MSRC PLANETS Migration Web Service are listed. If people use the tool over time, and the central instance of the testbed is further developed, the number of services that it includes may grow, but for now it is of limited utility.
It is also possible to browse pathways (i.e. to select an input format, output format, or service, but the interface was very slow when I attempted to use it–after clicking a checkbox next to a format ‘AVI’ to filter list the, the page began reloading and filtered to a blank list when it finished–perhaps because no results were found. The same thing happened when I tried to filter for ‘GIF’ as input format, so it appears that it was broken at the time I attempted to use it.
In short, the ‘browse services’ portion of testbed provides a tool that is much needed and, in general, well designed. However, the lack of relevent information that it provides at this time makes it much less useful in practice than in theory. It needs support, use, and–long term–integration with something like UDFR.
Tthe public testbed also allows a user to browse the results of experiments that have been previously conducted by other users. The interface is well designed, so that you can easily locate relevant experiments; for example the list can be filtered rapidly
In Theory, one of the best parts of the Testbed experiments is that they allow a user to make an objective determination as to whether the experiment succeeded or failed. All experiments are run in such a fashion as that the ‘server’ is locked, so that time to process, processor load, etc are measured consistently. In the case of experiments that include more than one file, time to migrate is provided in a convenient bar graph:
However, most of the included experiments are incomplete, Only 42 of 407 were completed at the time I used it, and most of those completed showed little hard information that would be of use in actually selecting a preservaton tool. Many seem to have been conducted in order to test whether the testbed was working, rather than to conduct an actual experiment.
In other words, the information that the experiment report generates is only as good as the input data, files, and final comments/evaluation that is entered by the experimenter. For this reason, my conclusion regarding the ‘Browse Experiments’ area is similar to that for the ‘Browse Services’ area: Great idea, but at this time not all that useful to an archivist at a smaller repository, due to lack of relevent data.
The testbed also allows authenticated user to conduct their own experiments. The process is simple. First, you define some basic properties of the experiment, including name, summary, and, if desired a few optional fields such as scope of experiment, purpose. You can even add literature references if desired.
In step two, you design the experiment by a) stating the basic type of experiment (in this case, migration), b) designation the input format, migration tool and output format and c) selecting/uploading files on which to conduct the experiment. After submitting the experiment for approaval, it was automatically approved, and I ran it. The process took about one minute. In order to measure whether the experiment succeeded, you can then run various identification and characterization services on the two objects (by selecting an item from a drop down list). For example, characterizing both objects with the New Zealand Metadata Extractor showed that the objects were the same in most of the significant properties measured by that tool:
Running the XCDL Comparator (which, in theory, would provide a more complete comparison) instead provided only many field showing missing data. The evaluation page also allows you to download each of the files. I did so, and they seemed identical to the naked eye (although the TIFF file was 14MB larger). You can also manually select properties to compare (for images, a typical property is image_height), then enter results into the interface; however this would be a very time consuming process.
On the last page, you ‘finalize the experiment’ after (optionally) entering a few comments and rating both the experiment (was the experiment itself useful) and the service (did it do what you wanted it to.)
Several things impress me about the experiment interface and process. First, it is very well designed and ‘clean’. Any archivist would feel at home using it, and the entire process of setting up an experiment, uploading a file, running the experiment and evaluating the results took about an hour–even though I was taking screenshots and writing this blog entry while doing it. Second, the tool does what it says it will. At the end, you know whether the experiment succeeded or failed.
However, there are a couple of caveats: Since it depends on the service registry, a very limited number of preservation/migration actions are supported. I could not find one, for example, for WordPerfect files. If the service registry is added to, and in particular, if the Planets follow on organization finds some way to include proprietary or other tools, it would be much, much more useful. Second, the characterization services do not provide enough information to make an automated characterization possible. This is not a criticism of the tools so much as an acknolwedment that what they do is very complex. Therefore, in the end, most users will need to make a manual determiniation as to an experiment’s success or failure. Finally, it would be very helpful if it were possible to directly compare the effect of several tools within one experiment. As the application is structured now, only one migration tool may be used in each experiment. In order to do a comparison, you need to copy and existing experiment, then change parameters, and rerun the experiment, etc. And, after you are done, you need to browse several pages to review the results.
Nevertheless, I think that the Testbed is an extremeley useful tool. It is very much needed, and in order to make it optimally useful, it will be necessary that more tools and services be integrated into it. Let’s hope that is part of the long-term plans for the Open Planets Foundation.
My “Score” for a Small Archives:
- Installation/configuration/supported platforms: 18/20 webservice available, can also be installed locally but no instructions at this time.
- Functionality/Reliability: 18/20 a few usabilty problems with comparator and browins services
- Usability: 9/10–very good interface
- Scalability: 9/10–runs very smoothly now, seems scalable if other services added
- Documentation: 8/10–Intergrated help is very useful; interface is mostly self-explanatory
- Interoperability/Metadata support: 8/10–can export results to spreadsheet or xml
- Flexibility/Customizability: 4/10–Looks difficult to add services
- License/Support/Sustainability/Community: 9/10
Final Score: 83/100
Bottom Line: this tool is much more useful and had more long-term potential for development as a tool to support preservation planning in small archvies than PLATO, because it is so much easier to use and provides such immediate results.