iRODS

On April 8, 2010, in Research, by Chris Prom

A few days ago, Mark Conrad noted on the Archives and Archivist listserv that iRODS is being used by NARA for its data grid, and recommended that others consider using it as part of a procedure for monitoring and verifying checksums.  I’ve been aware of this project for a while, and finally decided to try downloading the basic software and installing it.

iRODS is a tool that, like Archivematica or RODA, could be used to implement many of the technical components of a trusted digital repository conforming to the OAIS Reference model.  Previously I had not assessed it because I regarded it only as a tool for large, distributed data grids, and I thought it would be hard for most archivists to implement.  However, after working with it for a while, I can imagine a scenario in which someone at a ‘small’ institution could set it up and use it for a storage repository–but only if they have  considerable  IT help.  In the rest of the post, I provide an overview of IRODS and detail my experiences trying to understand, install, and use it.

What is IRODS?

As the documentation explains, iRODS is software that can be used to establish a ‘data grid’ and to establish and enforce rules regarding how digital objects are managed in the that grid in order to ensure their preservation as a persistent digital archives.  Objects (files or folders) can be moved into storage locations and managed, using unix-like “icommands”, or client software (including a web browser, if some additional software is established on the server).  It includes a built in catalog (iCAT) for managing entries; iCAT  is installed as a postgreSQL database during install, or you can connect to a preexisting database installation if some additional configuration is completed during and after installation.

Two critical component of iRODS is its rule engine and microservices.  Rules define operations that will be undertaken under a set of circumstances outlined in a rule definition that is found in a core file.  Any rule begins with msi call a so-called ‘microservice,’ which is basically a function that can perform other tasks that extend iRODS.  Rules and microservices can be chained together to execute complex action sequences.  The rules are written in the “C” language.  They can be seen as somewhat akin to the ‘scheduler’ option in the RODA interface, which allows a system administrator to establish tasks such as converting images.  The difference is that in iRODS the rules are not executed on a ‘schedule’ but in response to a particular event, such as ‘iputting’ a file into the system or verifying file integrity (via checksum comparison), then replacing any bad files from an alternate location in the grid.

While iRODS is most often used for large data grids such as those used to store scientific data or large archives, such as NARA’s Transcontinental Persistent Archives Prototype, it can be implemented by archivists without access to a similar level of technical support.  How easy is it to set it up and use?

Working with iRODS

Basic installation is relatively straightforward simple–if you have some good UNIX skills and can make sense of the system requirements.  (The documentation states it can be installed on a Windows computer, there are no instructions provided for meeting those requirements and as far as I can tell, it is most often implemented on UNIX-based systems.  At first, I downloaded and tried to install a windows binary, but the binary can only be used if an  iRODS server is already running on another windows box.  Then I downloaded the main install packet and unzipped it to a folder.  However, I did not get very far with installing it because there is no batch file to run.  It looked like I would need to install perl and or something else, but I did not have time to experiment now.  I may try again later.

Next, I tried a mac installation.   I downloaded the Developer Tools (i.e. xcode, etc) for MAC OSX, as the requirements page states, then downloaded the main program packet.  I put the IRODS folder in my Applications directory, opened a terminal, and typed ./irodssetup:

From there, I simply accepted all of the defaults as shown in this example install on the IRODS site.  Once it was installed, I typed ./irodsctl start to start the IRODS service on my computer.  In any kind of production system, IRODS would need to be stared as a service when the computer boots, but manually starting and stopping the serve was sufficient for testing.  Once IRODS was running, I was easily able to use the iCommands to upload a testfile to the IRODS storage Vault on the local computer–although it did take me a bit of time to figure out that I needed to add the folder clients/icommands/bin to my $PATH environment variable in order to get it to work.

I experimented a bit with software from the terminal where I had it installed, using the tutorial. It is easy enough to use direct commands to learn about a system, but a system developer would need to do a lot of configuration work to set it up etc for an archivist to use.  For example, based on a JISC evaluation of iRODS (see section 6 of the final report PDF), it appears that the system can be configured to automatically execute image and other type of conversion work using rules and microservices to call another application, such as ImageMagic or Open Office.

I got curious: could I connect to it with a browser?  The iRODS wiki didn’t help me figure out what I needed to do, but I eventually found a very useful evaluation of iRODS on the JISC site. It detailed a very complicated installation process to install Apache, php, etc, so that a browser can connect to iRODS.   I did not have time to do that as part of this short evaluation.  Based on other evaluations, it looks like the web client can only execute operations on single files at at time, although more robust functionality may be developed in the future.

My impressions of iRODS

All in all, I feel like iRODS could play a very important role in a digital preservation system for smaller archives, but the iRODS website, documentation, publications are very complex, and any archivist would need significant technical support to get it configured.    I have heard several presentations related to it over the years, and content of those presentations as well as my own experience indicate that the software is very powerful, but to date it has been implemented only with significant involvement from computer scientists or those who developed the system at UCSD and North Carolina.  When I opened the text (irods/server/config/reConfigs.core.irg) file which defines the rules, I understood why.  The rule syntax alone is very complex, and any significant implement would also involve bundling other programs as part of a microservice.  (It is unclear to me at this point if there is a way to define rules and microservices that does not involve manually editing code; I think one of the big advantages of RODA is that a large number of actions can be schedule in the graphical interface)

Over time, it is possible that a customized version of iRODS, or services developed on top of it (for example, to bundle it with other open source software (in the way which RODA and Archivematica do), or the development of a graphical ‘front end’ to administer the server, could make it more user friendly for those without a high level of technical support.

One or more of these things may be an outcome of the NHPRC-funded DCAPE project, although very little information about the project is currently available online.  Similarly, I believe that Michigan State University’s NHPRC-funded electronic records project is also using iRODS, so the outcome of these projects may provide more clarity regarding the future utility of the software outside of NASA, NARA, etc.  However, my general impression after working a bit with it is that a system like Archivematica, which uses simple file based storage, would be much simpler for most archivists to implement.

[updated April 22, 2010]:

IRODS: ‘Score’ for a Small Archives

  • Installation/configuration/supported platforms: 5/20
  • Functionality/Reliability: 15/20
  • Usability: 2/10
  • Scalability: 5/10
  • Documentation: 3/10
  • Interoperability/Metadata support: 5/10
  • Flexibility/Customizability: 5/10
  • License/Support/Sustainability/Community: 4/10

Final: 44/100

 
  • http://www.petegodfrey.com nathan anderson

    It is interesting to note that archivists can also use software and tools that will be convenient for them. The blog post speaks for itself and practical e-records sounds apt for the blog. More power.