This page describes my research project as I originally conceived it.
Fulbright Distinguished Scholar Award Project Statement: Practical Methods to Identify, Preserve, and Provide Access to Electronic Records
‘Does the past exist concretely, in space? Is there somewhere or other a place, a world of solid objects, where the past is still happening?’
‘Then where does the past exist, if at all?’
‘In records. It is written down.’
‘In records. And — ?’
‘In the mind. In human memories.’
‘In memory. Very well, then. We, the Party, control all records, and we control all memories.”
– George Orwell, 1984, cited in Jimerson, 2007a
As an archivist, I have dedicated my life to identifying, preserving, and providing access to a representative, authentic, and usable record of the human experience. Like Orwell, I believe that without an accurate and faithful record, it is impossible to construct accurate histories or to form faithful human memories. This is particularly true in the digital age, where not only the past, but even records themselves, lack a concrete existence.
People from all walks of life use archives to generate new ideas, to confirm rights, to hold others accountable for their actions, and to perform many other functions that are fundamental elements of democratic governance. In addition, many people use archives while pursuing personal interests, such as constructing family and community histories. Archivists explicitly commit themselves to managing one-of-a-kind documentary resources because they document human activity and opinions in ways that other materials do not and cannot.
The archivist’s charge was difficult enough to fulfill before the advent of networked computing technologies. Today it seems overwhelming because most records of permanent value are created and used in ephemeral electronic formats. Both archival and non-archival records exist as magnetic or electrical impulses and can be used only when machines and software interpret them. Email, websites, blog entries, digital photographs, and electronic documents are very susceptible to loss, deletion, tampering, de-contextualization, and misinterpretation. Furthermore, records of permanent archival value typically exist alongside records that have only fleeting utility. The individuals or institutions that created these records as a part of their daily activities have little knowledge of archival principals and practices, although they often have a strong interest in permanently preserving their records for future personal or research use.
How does one identify, capture, preserve, and provide access to the right set of electronic records—a portion that may comprise only 2-5% of the entirety of records created by society? This perplexing issue faces many archivists, but it is particularly acute at ‘smaller’ institutions, by which I mean those without access to computing resources typically available at large national, state, or university archives.
Project Goals and Rationale
In this project, I propose to undertake initial research that will allow me to develop a suite of methods and tools. The toolset will facilitate day-to-day work with electronic records and will make it feasible for archivists and manuscript curators at ‘small’ repositories in the United States, the United Kingdom, and elsewhere to fulfill their professional responsibilities in identifying, capturing, preserving, and providing access to electronic records that fall within their repositories’ documentary mandates. Beginning during my 2009-10 sabbatical and continuing in the succeeding years, I will work with programmers at the University of Illinois to build open source, Internet-based tools that put the specifications and standards into practice. The specifications and software will be made available freely to others via the Internet, making it possible for many archivists to identify, capture, preserve, and provide access to electronic records in multiple formats.
The suite of methods and tools that are developed during this research project and over the following years will incorporate three main elements 1) adaptations of currently-existing tools, 2) new tools, developed as needed, and 3) a web-accessible guide to the toolset, with specific implementation advice for working with electronic records. The entire package will provide archivists and archival researchers with an easy-to-use and implement ‘cookbook’ for performing archival functions such as appraisal, selection, preservation, description, and user access. It will incorporate many features that are now hard to access or use outside of large research institutions and national/state archives.
The project builds upon the past work I have conducted in designing and building software for managing descriptive information about archives and manuscript collections. The Archon software, described fully at http://www.archon.org, is an innovative, easy-to-implement tool for providing access to archival descriptive information and records. It is available under an open source license, has been widely implemented, and has been nominated for several awards. It lacks one essential element—a means to ingest, preserve, and provide access to large volumes of electronic records—a flaw I hope to begin rectifying with this research project.
The project also seeks to build upon prior electronic records work that has been conducted by many individuals and groups. Over the past 15 years, people and institutions developed a rich theoretical framework for systems and some tools to help manage electronic records. Much of this work is cited in the bibliography. Taken as a whole, this work has been tremendously valuable to the archival profession, but it is voluminous and difficult for practicing archivists to assimilate. It has resulted in relatively few tools that can be implemented by the typical archives. For example:
- The most widely cited framework for submitting, managing, and providing access to electronic records is the Open Archival Information System (OAIS) Reference Model, a standard developed by NASA for space information systems. The framework’s recommendations roughly parallel those for analog records that are provided in standard archival texts and best-practice guides. While the OAIS recommendation is very detailed, it offers little practical guidance. Archivists in many countries aspire to accept the framework, but tools to implement its recommendations are difficult to put into practice and typically do not support all archival functions without extensive customization and integration of multiple systems into a suite of services.
- The library and information science community has done much to provide a sound intellectual framework and training materials for issues related to the long-term preservation of digital materials. Resources such as the Digital Preservation Tutorial, the Standards for Trustworthy Digital Repositories, and the reports of the InterPARES 1 and 2 Projects are helpful, but the recommendations cannot be implemented without access to significant, administrative, technical and financial resources.
- A number of complex metadata standards, such as the PREMIS (Preservation Metadata Implementation Strategies) Data Dictionary and METS (Metadata Encoding and Transmission Standard) are available. Again, it must be noted that tools to exploit them are difficult for the practicing archivist to implement.
The aforementioned resources (and others cited in my bibliography) are tremendously valuable, but much of the research has been targeted inordinately toward transactional records created by large bureaucracies and corporate bodies, such as datasets. Other studies concentrate on documents available publicly through the Internet. Authors outside of the library community have recently noted that much of the work to date has not resulted in practical, affordable methods that can be used to preserve personal archives or non-corporate records held outside of a large institutional setting (Gladney, 2008; Marshall 2008).
It is time to synthesize practical advice from this research. It must be condensed into a set of methods and software that those working in ‘small’ archives can use to capture and preserve electronic records. Such work will make it possible for actual archivists to work with actual electronic records created by actual individuals and institutions.
Several recent developments suggest that this is a good time to conduct such a research and development project:
- Only a handful of practicing archivists have embraced digital preservation initiatives, perhaps because of the complexity of the problem or because resources, time, and expertise to conduct such work are scarce. For example, my recent study of processing practices at college and university archives in the United States found that only four of twenty-nine archives held more than a token amount of electronic records (Prom, 2008). Only one had established procedures and tools for acquiring electronic records, and other research also indicates that few archives in the United States are ready to collect electronic records in a systematic way (Davis 2008).
- Library tools related to digital preservation and access (e.g. open-source software such as D-Space, Fedora and, Omeka), as well as commercial products and services (such as ArchiveIT and DigiTool) include about 90% of the functionality that archives need, but they are difficult to implement without significant technical and/or financial support.
- Demonstration projects (such as the September 11 Digital Archives), have been implemented by large national and state libraries or as part of grant-funded work, such as that of the Center for History and New Media at George Mason University. Such projects aim to preserve a select amount of electronic information related to issues of national and international importance, but practical lessons and tools must be adapted from them, so that smaller archives are able to develop similar resources for the materials that fall within their repository’s documentary purview.
- Recent work in Europe and in the United Kingdom in particular has provided some practical guidance for implementing electronic records work. Recommendations such as those in Oxford University’s Paradigm Workbook, as well as the reports of the Digital Curation Center and the Joint Information Systems Committee, summarize standards and tools that can be used to accession, describe, and provide access to electronic records. In addition, the InterPARES3 project, which focuses on training for electronic records work, includes a substantial UK and Irish component, to which I could contribute substantially.
- The broader library and information management community is showing increased interest in using archival principles and practices to manage digital assets. Such themes resonated loudly during the recent American Library Association Rare Book and Manuscript Section Meeting (Proffitt, 2008). Recommendations to streamline processing and digitization practices for paper-based records (Greene and Meissner, 2005; CLIR 2008) must also be applied to records that exist solely in electronic formats.
In brief, all of the pieces are in place to develop a suite of specifications and software that will help ‘smaller’ archives in the US, UK, and worldwide capture, manage, preserve and provide access to the electronic records that will serve as the basis of future history and memory.
The project that I am suggesting will be undertaken during a ten-month sabbatical at the Center for Archival and Information Studies (CAIS) at the University of Dundee. CAIS will be used as the base to conduct research and to facilitate visits to other institutions in the United Kingdom and Europe. I will use these visits to gather first-hand information about state-of-the-art digital archives and preservation projects. I will abstract the findings into a set of functional specifications and will begin programming work on a toolset that will make it easier for archivists to indentify, preserve, describe, and provide access to electronic records.
The project will have several immediate outcomes in the form of peer-reviewed research papers and a project website, hosted by CAIS and/or the University of Illinois. The project website will include reports and open source tools developed during the nine-month project and will grow with additional content over time. In addition, I will provide project updates and presentations in both the United Kingdom and the United States at appropriate institutions and conferences.
The specific tool or tools to be developed or extended from current applications may include but are not limited to:
- A web application for browsing personal electronic records (such as PDF files).
- Scripts to regularize disparate email formats into a standard XML (eXtensible Markup Language) format
- An email viewer application that uses the aforementioned XML format.
- An extension or toolbar to the popular “Firefox” browser that will allow archivists to easily capture blogs and other ephemeral web-based materials to a local network or electronic repository.
- A toolkit of open-source/free utilities to facilitate file management (such as bulk file re-namers and metadata editors).
This project must be undertaken in the United Kingdom because most of the practical work that would be immediately applicable to the project is taking place at repositories and research institutions in the United Kingdom or in nearby European Union nations. (The other major center for such work is Australia, New Zealand, and East Asia.) Institutions to be visited may include:
- Digital Curation Center, Universities of Edinburgh and Glasgow
- Humanities Advanced Technology and Information Institute, University of Glasgow
- Digital Preservation Coalition, University of York
- PARADIGM and CAIRO projects, Oxford University
- Joint Information Systems Committee, Bristol (Digital Preservation Initiative, Repositories Programs)
- UK Web Archiving Consortium and PLANETS Digital Preservation Project, British Library
- DROID/PRONOM Projects in Digital Preservation Department, UK National Archives
- University of London Computer Center (database curation)
- Center for e-Research, King’s College, London
- Web Archiving Consortium members, such as the Wellcome Library
- University of Wales Aberystwyth, Department of Information Studies (Jaqueline Spence)
- Aalborg University, Denmark (new program in digital archives directed by Jens Topholm)
- Bibliothèque Nationale de France (E-records management tool developed by Catherine Dhéhent and John Thompson)
The CAIS at the University of Dundee provides an ideal base of operations as it has an active archival research program and an on-line student constituency. Many of the students are already working in smaller repositories or will be after graduation. They will test, examine, critique, and help me implement my research findings. They will bring a unique perspective on electronic records tools. Along with CAIS faculty members Patricia Whatley, Caroline Brown, and Alan Bell, they will help me refine a set of practical standards and tools that are useful to archivists in the United Kingdom, the United States, and worldwide.
In 1997 and 1998, The Fulbright experience played a fundamental role in my development as a historian and archivist, when I held a student fellowship to study mutual aid societies in the United Kingdom. The fellowship allowed me to locate and use unique archival materials, many still in the hands of local aid associations or their members. Throughout the process, I met many people who taught me how important and difficult it can be to preserve unique materials for future research. As I encouraged them to deposit these materials at county record offices and local libraries, I was taking my first steps toward becoming an archivist.
In this research project, I hope to develop and disseminate essential methods and tools that will help all archivists fulfill that same role more easily in the digital age. I would not be so foolhardy as to suggest that I or a small group of people can tackle the problems and opportunities presented by electronic records in isolation from other research and development projects. In my work, I have one foot firmly planted in the camp of daily archival work and other in the theoretical and technical underpinnings of electronic records and networked computing technologies. Given the opportunity to exercise these interests in the environment of rich electronic records work that currently obtains in the United Kingdom, I hope to play an important role in helping other archivists and curators preserve and provide perpetual access to the raw materials of future history and memory.