SABA Copying Program

On April 1, 2010, in Research, by Chris Prom

Over the past day, I spent a good deal of time investigating the SABA-Copying program, developed by the Danish State Archives, and which I briefly mentioned here.

In short, I find the program really, really useful, even with its Danish interface.  Since this program is not at all known outside of Denmark, I’d like to to use this post to:

  • describe what it does and how to use it
  • explain why I like it
  • list some issues that would need to be addressed if it were to be used outside of Denmark.

Description of SABA Copying Program (and how to use it):

Essentially, the tool was developed as something that either archivists or donors can use to conduct file appraisal on records and to generate a copy of the appraised records for submission to the archives, where they will undergo additional migration/preservation and cataloging work.

The screenshot below illustrates the basic interface, which is divided into three sections: one to supply information about the archives, one to select folders and files to copy into the Submission Packet, and the third to specify the source location to which the files should be copied. I’ve supplied rudimentary English translations, thanks to Google Translator.

Initial Interface w/ English Labels

If you click on “Enter Information” in the top section, the screen below pops up. I am not sure I have the translations correct, but it provides a very simple means to supply basic identifying information and provenance about the materials being submitted; information recorded here is included later as metadata in a file that accompanies the copied records.

Supplying information about the archives

Back on the main screen, once you load one or more directories into SABA-Copying, some basic descriptive information is loaded concerning the file types that the program recognizes.  (the types of files that can be copied are limited to some common formats; the list cannot be extended so far as I can tell.  I tried adding new extensions to the .ini file for the application, but they were not recognized after I restarted the application).  The files that SABA can copy are colored green by default after the drive is mapped.

Working with the file list

Individual files or folders can be excluded from the list to be copied by pressing ‘delete.’ This removes them from the list to be copied, but a record that they were not included with be written into the metadata.  To reinclude them, press insert while the greyed out file is highlighted.

If you wish to view a file to see what its contents is, simply click it and it opens in whatever application is associated with the extension.)

If you wish to remove a file or folder completely from the list, and NOT record this fact in the metadata, push the ‘end’ while the file or folder is highlighted.

Pressing CTRL highlights a selected file or folder (e.g. marks the background yellow), so you can track what has already been  reviewed.

All of these actions can only be executed on one folder or file at a time; it is not possible to select multiple files or folders using the standard CTRL-click, Shift click, or window drawing operations.

Manually Excluding Files or Folders

By clicking on ‘Vis liste’ (View list) at bottom of the second section, you can filter the list of files to of a particular extension; you can do this either for files that are currently marked as included (green) or excluded (grey). A pane on the left hand side provides useful summary information for file types, and by clicking on the name of and extension you can view the full path and file names of all all files of that time.  By pressing the end key, files of a particular extension can be excluded from the list to be copied.   This is shown in the next screenshot.   Individual files can also be excluded, but the files are immediately removed from the list being viewed, not greyed out.  If you would like to reinclude them, you have to use another method.  First close the green ‘to be copied’ file type list pane, then either:

  • finding the file in the hierarchy, and reselecting the file with the insert key, or
  • open the grey ‘to be excluded’ file type list (see below), navigate to the extension for the type you deleted, and reselect w/insert key.

Bulk Excluding from Copy by File Type

It is also possible to view the entire list of excluded files, sorted by file type, by pressing the grey ‘vis liste’ button.  As this dialog shows, it is not possible to included files that are not of a supported type (e.g. those checked in the right hand pane) in the list to be exported.

Trying to include an excluded file type

When you are ready to copy the files to a new folder (or any location mapped as a windows drive), you simply select the location in the bottom window, then click “Start Kopiering”.  None of the original files are modified in any way, they are simply copied to a folder “SABA_Kopiering” in the target drive.  This worked well for me with most folders, but in one ase, the application crashed for me with a cryptic error message.

After the process completes, you can browse the files and metadata.  Everything is stored in three subfolders:

  • DOKINFO: Metadata regarding the copied files (their original paths and file names)
  • GENINFO: information submitted in the first section, in format of two rtf documents, regarding the archives
  • DOKUMENTER: The actual files, which as the screenshot below shows, are collapsed into one folder and renamed using an eight digit number and the original extension.

SABA-Copying Output

The DOCINFO folder contains a configuration file (in binary format) and an .ini file, as well as two metadata files:

  • MAPPE.ARK: provides a unique code for each of the folders/directorie
    • first eight digits: order in which folder was processed by SABA-Copying
    • second eight digits: unique id assined to folder (also referenxce in DOCUMENT.ARK, column one, for each file found in that folder)
  • DOKUMENT.ARK: provides information concerning each of the file:

Coded File NameOriginal Path and File NameFile Size in BytesExtension???
000000040000000300000003000000001$report.TXT 2010033020070420C:\Documents and Settings\manager\Judy's C Drive\docs\$report.TXT14739.txt12
0000007900000005000000000000000023455D284.XYI2010033020030206C:\Documents and Settings\manager\Judy's C Drive\WordPerfectCorrespondence\3455D284.XYI6509.xyi12

As far as I can tell the syntax used in the first two columns is:

Column one:

  • 1st eight digit code: order in which file processed.
  • 2nd eight digit code: unique id assigned by program to original directory name (as per MAPPE.ARK)
  • 3rd eight digit code: new file name in flattened folder isn the SIP, that is made during the copying operation
  • 4th eight digit code:  0000001=file was copied; 00000002=file was not copied and is not found in the SIP

Second column:

  • code: unknown function, perhaps a checksum value?
  • original file name/path

The MAPPE.ARK and DOKUMENT.ARK syntax that is not documented in the user manual, and it seems that other tools developed by the Danish State Archives use these file to process and covert the files.  I believe that the viewer application that is under development will reassemble the original directory structure for the end user.

Why I like the Program

  • It is unique in offering a way to select and copy electronic records to an archives, without any possiblity of altering or changing the original files in any way.  I think such a tool is very much needed, as I explain more in this post.
  • It provides a way to create SIPS for a large number of files, with the minimal amount of data needed to facilitate authentication of the records, as well as to facilitate additional processing/migration work.
  • The interface is well designed and generally easy to use, it needs little explanation.
  • It offers an easy way to collaspe the folder structure into one directory, while maintiaing a trail of the original order in the metadata files.  Once collasped in this way, the original files would be much easier to migrate to another format by file type.  The syntax is easy to understand and it would be relatively easy to write additional tools to process the information in the metadata files in conjuction with migration operations.
  • The lists of file excluded from the copy are recorded; in other words, the application automatically records appraisal decisions.

SABA-Copying as a Model for a general purpose Appraisal Tool

In short, this application may offer a model for future software development outside of Denmark, for a general purpose appraisal tool.  However, the following issues would need to be addressed to make the application more usable outside the Danish context:

  • The program is not generally available, so it may be necessary to develop a tool like this from scratch, unless the Danish National Archives provides a copy of the source code under an open source license.  (The copy I am using was generously provided to me by Peter Edelholt for testing.)
  • The program may need to be modified to make it possible to copy file types not included in the Danish State Archives list of supported file formats.
  • Obviously, the interface and user manual would need to be translated to English and potentially other languages
  • Some usability improvements could be made, for example to allow inclusion/exclusion of multiple files with CTRL and SHIFT-click selection.
  • Additional export options.  They syntax used is unique to Danish State Archives, and it would be best if some additional options were supplied for the output syntax (e.g. file naming conventions, ability to export all files of a particular type of a named folder, ability to include additional information in metadata –e.g. date last modified–etc.)  Possibly use a PREMIS-based syntax.
  • Would need to develop an associated suite of tools to manipulate files and metadata after deposit.  The Danes are converting everything to tif format, but such a course is unlikely to be pursued elsewhere; therefore, it would be necessary to have tools to do tasks like:
    • Migrating files post conversion, and recording migration data in metadata
    • reassembling original order either in the AIP to be stored, or in the DIP

In short, I’ve really enjoyed using this application, and see it as a potential model for the development of a similar application elsewhere, particular if additional features such as those on my ‘appraisal tool wish list‘ could be included.

[updated April 22, 2010]:

SABA copying ‘score’ for a small archives:

  • Installation/confuguration/supported platforms: 15/20
  • Functionality/Reliability: 12/20
  • Usability: 8/20
  • Scalability: 5/10
  • Documentation: 8/10
  • Interoperability/Metadata support: 5/10
  • Flexibilty/Customizability: 3/10
  • License/Support/Sustainabilty/Community: 2/10 (program not currently available via public website, contact me if you would like a copy.)

In general, the ‘score’ would by higher if english language version were available.

Final: 58/100



Tagged with:  
  • jochen_V

    Chris,

    I am searching for a solution for submitting documents to the archive, can i receive a copy of the program?

    Kind Regards
    xxx.xxx@xxx.xx