Using SIARD for Database Migration

On March 19, 2010, in Research, by Chris Prom

Using the appraisal tools I discussed last week, I discovered the the OIF files I am working with contain about 150 database files.  Most of these are in Microsoft Access format, although a few are Paradox files.  Using a free Paradox file viewer, I was able to quickly determine that the latter were contact databases, and decided not to undertake an migration or preservation work on them.

Similarly, I examined the 84 access databases included in the accession record and quickly determined that many of them held duplicate information.  Based on an examination of each database, I determined that the vast majority of them containted transactional information (such as order of merchandise relating to Banned Books Week or conference registrations), and did not meet appraisal criteria for permenent retention.  I therefore deleted those.

But one database in particular, had enough evidential and informational value to suggest that it should be prseserved permenantly: a comprehensive database tracking book challenges that have been reported by librarians to the OIF.  While the file is certainly readable using current versions of Microsoft Access, and while I will certainly retain a copy among the final SIP that I am preparing, prudence suggested that a copy should also be generated in a non-proprietary format so that the data at least, if not the look and feel are preserved outside of a depedency on proprietary software.

As I noted in a previous post, SIARD, developed by the Swiss Federal Archives, is one tool that can be used for database normalizaton.  It is platform-independent java tool.  After spending a bit of time working with it, I am impressed by its capabilities, but unfortunately, I ran into repeated and intractible problems in using the program with some large Microsoft Access Databases that used poorly defined schemas and or badly structured data.  In the end, I could not get the software to create a normalized database for the Challenged Books Database. (More on the after the jump.)

SIARD Overview

SIARDEdit is the main SIARD program.  By defining a few simple paramters (shown below), you can connect to a database and make a copy of it into a zipped .siard folder, which places the entire structure and metadata into a self describing xml format (large text fields and binary objects are saved as separate files and referenced in the xml metadata.)

SIARD: Define Database Connection

Actually, using SIARD to archive a database is  not quite that simple, since the databases need to be ‘prepared’ before SIARD is able to migrate them.  In the case of an MS Access database, the ‘Admin’ user needs to be provided read access to three hidden system tables and an ODBC connection needs to be configured using the Administrative Tools under Control Panel (in order to use SIARD on Linux, in order to convert MS Access databases,  you would need to purchase an ODBC driver for Access.  While it is not difficult to prepare the database, it took me quite a bit of time before I determined that I needed to do so.

SIARD converts small databases relatively quickly, and in the end, the tables and queries of the original database can be browsed in the SIARDEdit interface.  To the extent that they were formally defined in the database structure, relationships and foriegn keys are retained, although there is not convenient way to view related data as it may have been presented in forms or purpose built websites.  (For example, in the Challenged Books database, a key linked challenged books to institutions, and these relationship can be constructed manually, but are not presented to the user.)  Similarly, description fields for column names are placed into the description metadata for each column.  Unforutnatley, the Properites.  Here is a screenshot showing a table view, with the user editing metadata describing the purpose of the table:

SIARD Metadata Editor

The normalized database is saved to a zip file, with the exension .siard.  The metadata from the zip files can be exported into xml files directly from the SIARD edit program, and you can also browse the files directly, but only with the PKZIP compression software, since the alogrithm that SIARD uses to compress the folders is only supported by PKZIP at present, in order to accomodate zip files greater than 4 GB in size.

Using SIARD

I tested the use of SIARD on a few MS Access databases (currently it can only be used with Access, Oracle, and MSSQL databases, although there are plans to add support for other forms.   Although the program was easy to get running (once I read the manual), I was utlimately not able to get it to work with several large Access databases.  Incases where it failed, the program went through the entire process of copying tables and generating metadata, but would abruptly terminiate at the end of the process when trying to write results to the .siard file.  The error messages provided both in the interface and in the log output (which I manually enabled using the instructions in the excellent user manual) did not provide me much information to help me diagnose the problems.

However, in one case, I noticed that the process terminated when SIARD was attempting to save the metadata for an empty table.  After I removed the table from the source database, I was able to complete the operation without any trouble.  In the case of two other large databases, the operation failed during large tables, and the log output does not specify a specific row where the error occured.  Strangely, in one case, the application seems to mysteriously delete the file it is creating when I ran it on the Challenged Books database that I was trying to preserve, so the process failed when it attempts to write to the SIARD file.

In any case, it is important to note that when SIARD fails, the entire process terminates. Whether this was be design (so that it would be impossible to archive an incomplete database) or accident is hard to say.

Nevertheless, I think think that for many institutions, SIARD may offer the only realistic option normalize databases that have permanent value, where data preservaton is paramount to issues of look and feel.  The Swiss Federal Archives are actively soliciting feedback on it usage, so, hopefully some of the types of problems I encountered are elimiated, or at least mitigated over time.

One hope that the program continues to be developed, and let’s hope these problems can be worked out over time.

Tagged with:  
  • Pingback: More on SIARD « Practical E-Records()

  • Liesl

    Hi Chris, I am currently evaluating SIARD but encountered some error:

    SQL Exception SQLState: 25000 Message:
    [Microsoft][ODBC Microsoft Access Driver]Invalid transaction state ErrorCode 44

    I am wondering if you have encountered the same issue when you were evaluating SIARD?  If you did, could you let me know the resolution? My email is lieslw@gmail.com.  Thanks~

  • Chris Prom

    What version are you using? I believe that they have released a new version lately.

  • Liesl

    Thomas, nope, i have sent email to the Swiss Federal Archives 3 months ago and have not heard from them either 🙁