More on SIARD

On March 26, 2010, in Research, by Chris Prom

After my posting regarding SIARD last week, Hartwig Thomas, the developer at the Swiss Federal Archives who is most closely associated with SIARD contacted me to see if he could diagnose the problems I ran into while converting two access databases to the .siard format.  As I suspected, the problems the applicaiton ran into were due to irregularies in the source Access Databases (for example, the Challenged Books database that I was trying to convert had a field marked as #Deleted and thus was, in one sense, corrupt, even though it still opened in Access.

Hartwig modified SIARD so that it would insert a null record into the siard file for that row and made a few other fixes before providing me a link to version 1.20 of SIARD.  Using the new version, I was able to successfully convert all of the databases that had previously given me trouble.

In the course of our emails, Hartwig also noted that is possible to view the exported metadata from SIARD in a more user freindly format by referencing an xsl sytlesheet that is found in the doc folder.  Here is a screenshot of that output from the OIF Callenged Books Datbase.

SIARD Metadata Output

Of course, one could write other xsl stylesheets to view the actual data or to reformat it in whatever way necessary.

While looking for the stylesheet, I also noticed that SIARD includes and xsd schema, so the project has established a good basis for building a viewer for SIARD data. This makes me wonder, for instance, how much work would need to be done to build a content model for Fedora, in order to have a generic method for presenting any database content to end users.  (It is worth noting that RODA has a built in database viewer–which I believe is  mysql-based–for displaying data from normalized databases that have been ingested.  I have not had time to investigate the technology used by RODA in regard to databases.)

In the meantime, SIARD does provide a nice mechanism for making data available to end users.   the “Load Database” function  I easily used it to recreate the challenge database and a few other databases in new Access filse.  It worked flawlessly to reimport the table data.  Unfortunately, as the documentation states, the function does not currently load and of the queries, although I believe the a future version of SIARD will support that function. In the meantime, this feature provides a very good method to move data from one database system to another.

Hartwig also provided me a bit more information about how the zip feature works:

SIARD uses the ZIP64-Standard which currently is only supported by PKZIP because it needs to handle files larger than 4GB with more than 65’000 entries. It uses the ZIP64-Standard only as a container, however, and does not apply any compression. (You can see the XML contents quite clearly in a Hex Editor.) We are convinced that compression issues should be handled at a lower device-driver level of the storage media. Also the absence of compression increases the accessibility at a later time, because the compression algorithm does not need to be preserved.

And, after I asked about a tool other than PKZIP to view the files:

When developing SIARD I bet on a development when more and more of these other tools would start supporting the ZIP 6.x standard.  I myself have started a SourceForge project publishing my ZIP64 JAVA library but I have just copied the sources to it and have not yet “cleaned it up” and made it a nice project for others to use [ . . . ] I plan to use that repository to implement at least command-line utilities for zipping/unzipping files according to the ZIP 64 standard soon.

A tool like that will be necessary if an archives wishes to get direct access to large text fields or BLOBs, which siard stores as separate files in the archival information packet.

All in all, I’m getting more and more impressed with the SIARD proejct and software as I spend more time with it.   The Swiss Archives is continuing to refine the software and taking bug reports seriously.  Hartwig registered me as a SIARD user for an instance of bugzilla, which is being used as part of the development process and I believe he would also add anyone else who contacts the Swiss Federal Archives regarding SIARD.

In any case, it would not take any archivist (even one with a small technical knowledge) to get up to speed and running with the software–which is one of my my criteria for a successuful project.  I certainly intend to use this software frequently myself when I return from sabbatical.

[updated April 22, 2010]:

Hartwig Thomas has posted a sourceforge site for a base64 zip program.

‘Score’ for Siard:

  • Installation/confuguration/supported platforms: 20/20
  • Functionality/Reliability: 15/20 Would be higher but does not support mysql or other dbs
  • Usability: 8/10
  • Scalability: 8/10
  • Documentation: 10/10
  • Interoperability/Metadata support: 8/10. exetensive use of xml, but not sure how it would play with other metadata formats such as EAD, etc.
  • Flexibilty/Customizability: 5/10
  • License/Support/Sustainabilty/Community: 5/10

Final: 79/100

Tagged with:  
  • Pingback: zip program for SIARD files « Practical E-Records()

  • Eric Dawson

    Whoah! This is one of the best version of the SIARD I think. Using the new version, I was able to successfully convert all of the databases that had previously given me trouble. I have not had time to investigate the technology used by RODA in regard to databases.

    This bring my problem to solve it quickly.