In the third presentation at the DPC Preserving Email Seminar, Susan Thomas from the Bodleian Library presented case studies concerning acquisition and preservation workflows that are being used by the Bodleian Libraries.

As the main research library at the University of Oxford, units in the Bodleian Library have been receiving a wide range of digital manuscripts, personal papers, and organizational records.  With funding from the Andrew W. Mellon Foundation, the Bodleian’s FutureArch project is developing methods to deal systematically with hybrid (analog and born digital collections).  This work is documented through a very useful blog.The approach that the Bodleian is taking toward email can best be thought of as an example of the “sweeping up crumbs” approach that I discussed in my remarks.  Susan’s presentation (Powerpoint here) emphasized the ways in which personal mailboxes can be saved and converted to a system-neutral format that utilizes the RFC standards.  The Bodleian Library is developing processes to access and process ‘dead’ email collections, as well as those that continue to grow through active use.

In each instance that Susan discussed, curatorial staff harvested emails from local folders or an email server, often with the help of the donor (or a colleague/heir).  Records accessioned and converted include those from a email account of a former member of parliament/MEP (in Exchange/Outlook format), the professional email of an academic (in Compuserv 4.0 format!), and files concerning a completed book project (in Gmail).  In addition the Bodleian is developing processes to manage incremental accessions, working currently with the email records of a small press (organized by publication) and an administrator who used Lotus Notes/Domino.

In each case, curatorial staff identified the email as part of a broader records survey, recommending that the entire email account or a selected portion of it be transferred to the archives.  In most instances, staff completed background research to locate the files and to understand their structure.    Files were then copied to a portable hard drive or DVD in whatever source format could be provided by the email client or server.  Once the files had been received, FutureArch staff identified software to read and/or migrate the files into the EML formats.

In the course of migrating these files, staff faced several challenges, but none have so far proven to be insurmountable.

In one case, over 200,000 messages were included in only two folders: sentmail and inbox. The migration program they used copied the messages without problem, even though they were deposited in a single .pst file, split over 5 DVDs.  While staff are reasonably confident that the messages migrated correctly, verification cannot be completed because the original file was so large that it could not be opened in versions of Outlook accessible to the staff.

In another case, staff faced a significant challenge in segregating personal/private emails from professional correspondence (a donation requirement).  In the end, FutureArch staff used a forensics tool, FTK to present a set of emails to the donor for review.  In this same instance, staff needed to migrate emails from the obsolete compuserv format.  This became a multi-step process.  First they located and installed a copy of the Compuserv client application.  The messages could not be viewed in the client because access required an active account, but it proved fortunate that the program could be located: the migration tool that FutureArch staff located (CS2Eduora) required it.  After finding an old version of Eudora on an abandoned website, staff members were able to view the emails in Eudora, then migrate them into a format that could, in turn, be migrated to EML files.

Finally, Susan cited the case of email stored in a Lotus Notes local folder (.nsf files) or on Lotus Domino servers as particularly problematic.   Notes has a very poor end-user export facility.  In theory, FutureArch staff could have established an IMAP connection to the Domino server, then used a client like Thunderbird to migrate the messages. Unfortunately, the Domino server is configured to support IMAP connections.  Therefore, staff will negotiate with system administrators to see if email can be exported using server tools.  From that point, they will likely using a two-step migration process to get to EML

In summary, Oxford’s experience drives four points home:

  1. There are many variables to attend to when capturing email.  Each case required different tools.
  2. We need characterization tools.  For example, it would be highly desirable to complete audit verification, even if it was as simple as counting the number of emails and attachments before and after conversion.
  3. Researchers will need visualization tools and client-like interfaces to use email, but in practical terms, files in EML format can be imported to most current clients.
  4. Finally (and perhaps most importantly), successful email preservation projects are built on trusting relationships and professional competence.  People are understandably nervous about donating semi-structured information like email.  Curators must take those concerns seriously and provide effective mechanisms to address them.

All in all, Susan’s presentation presented the a practical and sustainable approach to dealing with email that might be accessioned as part of a broader records transfer; the techniques the Bodleian is developing will be applicable at many institutions, no matter their size or funding level, and will prepare them well for managing email within future digital repository settings.

  • Peter Chan

     Susan’s presentation (Powerpoint here) needs login to access. Is it possible to post it to a public place? Thanks.
    Peter Chan
    Digital Archivist, Stanford University Libraries

  • Peter, I have the wrong link in blog, will fix it shortly.