Selected Email Preservation Resources

On April 4, 2014, in Research, by Chris Prom

This past Monday, I spoke at the Museums and the Web “Deep Dive” on email preservation.  At the session, I distributed the following handout, which is drawn largely from my Digital Preservation Coalition Tech Watch Report.  I am posting it here, in response to a request at the seminar.

Selected Email Preservation Resources

Key Readings

David Bearman,  “Managing Electronic Mail.”  Archives and Manuscripts 22/1 (1994), pp. 28–50: outlines the major social, technical and legal issues that an email preservation project must address; is particularly useful in suggesting ways that system designs can support the effective implementation of policies.

Maureen Pennock, “Curating E-Mails: A Life-cycle Approach to the Management and Preservation of E-mail Messages,” 2006: Reviews the major challenges to email preservation and summarises some prospective approaches, with particular emphasis on the need to manage email effectively during its period of creation and active; also outlines the major conceptual approaches that can be used to preserve email, with somewhat less description of particular tools or services.

Richard Cox, “Electronic Mail and Personal Recordkeeping. In Personal Archives and a New Archival Calling: Readings, Reflections and Ruminations. Duluth, Minnesota: Litwin Books, pp. 201–42.  Reviews the history of attempts that the archival profession has made in preserving email messages and their content, suggesting that the best approaches will understand and preserve them as the organic outcome of our professional and personal lives. Cox suggests that those wishing to preserve email draw on concepts and procedures from both the records management and manuscript archives traditions, but the chapter contains relatively little direct implementation advice.

Gareth Knight, InSPECT: Investigating Significant Properties of Electronic Content 2009: A report on email migration tools, completed for the InSPECT project, includes a description and analysis of the structure of an email message, identifying 14 properties of the message header and 50 properties of the message body that must be maintained during migration if an email is to be considered authentic and complete. The report also outlines a procedure for testing whether particular email migration tools preserve those properties and applies that procedure to three specific tools.

Christopher Prom, Preserving Email, Digital Preservation Coalition Technology Watch Report: Provides a summary of social, legal, and technical challenges and opportunities for email preservation, reviewes and explains internet standards and technologies for email exchange and storage, and recommends particular approaches to consider in an email preservation project.

Useful Tools


Exchange Server: A proprietary application developed and licensed by Microsoft Corporation, providing server-based email, calendar, contact and task management features. Exchange servers are typically used in conjunction with Microsoft Outlook or the Outlook Express web agent. Exchange servers use a proprietary storage format and messages sent using Exchange typically include extensive changes to the header of the file. Calendar entries, contacts, and tasks are also managed via extensions to the email storage packet. Depending on local system configuration, users may be able to connect to a specific Exchange server using an IMAP-aware client application.

Internet Message Access Protocol (IMAP): A code of procedures and behaviours regulating one method by which email user agents may connect with email servers and message transfer agents, allowing an individual to view, create, transfer, manage and delete messages. Typically contrasted with the POP3 protocol, IMAP is defined in the IETF’s RFC 3501. Email clients connecting to a server using IMAP usually leave a copy of the message on the server, unless the user explicitly deletes a message or has configured the client software with rules that automatically delete messages meeting defined criteria.

Multipurpose Internet Mail Extensions (MIME): A protocol for including non-ASCII information in email messages. Specified in IETF RFC 2045, 2046, 2047, 4288, 4289 and 2049, MIME defines the precise method by which non-Latin characters, multipart bodies, attachments and inline images may be included in email messages. MIME is necessary because email supports only seven-bit, not eight-bit ASCII characters. It is also used in other communication exchange mechanisms, such as HTTP. Software such as message transfer agents, email clients, and web browsers typically include interpreters that convert MIME content to and from its native format, as needed.

PST: .pst is a file extension for local ‘personal stores’ written by the program Microsoft Outlook. PST files contain email messages and calendar entries using a proprietary but open format, and they may be found on local or networked drives of email end users. Several tools can read and migrate PST files to other formats.

Simple Mail Transfer Protocol (SMTP): A set of rules that defines how outgoing email messages are transmitted from one Mail Transfer Agent to another across the Internet, until they reach their final destination. Defined most recently in IETF RFC 5321.

Tagged with:  

Comments are closed.