At least two people have asked me whether I have been using the FITS tool, developed by Harvard and available on Google Code. Since I hadn’t used it directly–although I was aware of what it did–I decided to download it and give it a try.
FITS does something that is potentially very useful–if you are able to take some additional steps to integrate it into an e-records processing workflow.
Essentially, FITS wraps together output from a variety of other open source tools that can be used to identify and characterizing the contents of files (such as DROID, Jhove, and the New Zealand Metadata Extractor). The output is written to the screen or an output file.
FITS is a command line tool that runs in any windows or unix/linux environment, and typical usage is shown in the attached screenshot:
The amount of metadata that FITS wraps together for each tool is quite extensive. At a minimum, it spit out some metadata for every file, at least an MD5 checksum and the DROID output for instance, as well as the output of ffident and the File Utility Tool. Where the file types are supported by other tools (such as JHOVE or the New Zealand Metadata Extractor, additional metadata is supplied.
Summary metadata including the basic file type and and MD5 checksum are extraced to a few header elements at the top <identification>, <fileinfo>, <filestatus> and <metadata>. In addition, the output from each specific tool is written into a wrapper element <tooloutput>, in whatever format is native to that tool.
One of the things I like best about FITS is its relatively simply interface. It would easy to incorporate it into other tools. In addition, I really like the fact that the Harvard developers are providing a clear roadmap for tool development, and have designed an extensible framework, so that additional tools can be added. All in all, FITS is a very useful tool, and I like the fact that it provides some metadata for each file it encounters, so you can run it against any set of files, not just those of a particular type, and expect to get some good preservation metadata in response.
There are a couple of downsides. First, information can be produced for single files at a time; there is currently no automated way to run it against an entire folder. However, it would be fairly straightforward to automate its usage using something as simple at NoteTab or to write a simple GUI interface for it using the programming language of your choice. More likely, developers will integrate it into other workflows for processing and storing archival information. For example, the Archival Information Packet that Archivematica produces includes a FITS file record for each file included in the packet. Another potential downside to using FITS is that it will require a lot of maintenance to ensure that the latest version of each tool is being used. While Harvard is releasing updates fairly often, I noticed for instance, that the DROID signature file used by FITS is version 13 (the latest one is 32).
If someone were to write a GUI for it, it would be easy to see incorporating it into a processing workflow, but in the meantime it will remain a nice tool for geeks like me to experiment with, or for developers in much more ambitious projects, to incorporate into their own designs.
[updated April 22, 2010]:
FITS: ‘Score’ for a Small Archives
- Installation/confuguration/supported platforms: 20/20
- Functionality/Reliability: 10/20. Would be higher but no batch processor
- Usability: 4/10. No GUI but command line easy to use.
- Scalability: 10/10. It works very quickly and would likely scale well with ability to run in batch mode
- Documentation: 9/10 excellent, concise user and developer documentation
- Interoperability/Metadata support: 9/10- the information it outputs is essential preservation description information for many common file types.
- Flexibilty/Customizability: 5/10
- License/Support/Sustainability/Community: 4/10