Re-imagining Academic Archives

On May 27, 2010, in Research, by Chris Prom

A few days ago, Tom Scheinfeldt and Dan Cohen announced a unique call for papers toward a volume to be titled Hacking the University.  Here is my submission, summarizing certain parts of my work as a Fulbright Scholar at the Centre for Archive and Information Studies, University of Dundee.  This version has been stripped of footnotes, which I can add back in if it is accepted. However, I have added a few hyperlinks. Although I take issue with certain elements of this exercise, since doing anything in a week can easily lead to sloppy thinking, I think that the forum might be a good place to get some broader exposure for the important role that archives must continue to play in the academic community. So, here goes nothing!

Important notices

Does the past exist concretely, in space? Is there somewhere or other a place, a world of solid objects, where the past is still happening?’
‘Then where does the past exist, if at all?’
‘In records. It is written down.’
‘In records. And — ?’
‘In the mind. In human memories.’
‘In memory. Very well, then. We, the Party, control all records, and we control all memories.”

–  George Orwell, 1984

Archives are rarely created for the express purpose of being preserved but develop organically as people live their (typically chaotic) lives.  Archivists—many of whom serve in university archives and manuscript libraries—are dedicated to identifying, preserving, and providing access to a selective, authentic, and usable record of that messy human experience.  People from all walks of life use archives to generate new ideas (or test existing ones), to confirm rights, to hold others accountable for their actions, to gain personal depth of understanding, to establish a connection with society or to the past, and to perform functions that help preserve democratic institutions, sustain civil society, or ensure social justice.

The archivist’s charge was difficult enough to fulfill before the advent of networked computing technologies.  Many people make overblown claims that ‘digital dark age’ is now upon us—that all of the electronic files we are creating will someday vanish.  At first blush, we instinctively wonder how this could be possible: If there is one thing our lives do not lack, it is access to information.  People demand, and are constantly developing better ways to control, index, and sort massive stores of information, but few believe that it will all someday vanish or perhaps, slowly rot away.

It is trite to say that email, websites, blog entries, digital photographs, textual records, database files, and other electronic records are very susceptible to accidental loss, deletion, or de-contextualization, even if we do not accept the premises of dystopian predictions [1, 2] that civilization will collapse after the oil runs out or a catastrophe besets humanity.  Nevertheless, records become more fragile and vulnerable as individuals, business, and even governments outsource data storage and management to the warm embrace of [insert the name of the vendor your university is working with here], ostensibly under the rubric of cost cutting and efficiency.   And, most private individuals now create records using a wide range of tools, services and hardware, leaving interrelated records strewn across hard drives, shared servers, social networking sites, and cloud applications.  These documents reside under the care, custody, and control of many different people and organizations, not simply the person or organization that created and has a vested interest in their content.

Leaving the factors I just mentioned  aside, every set of electronic records is itself a constructed and contested entity.  The person who creates or assembles the documents and other molds them into an archives through their activities, interests, and sometimes, their malfeasance, subterfuge, or inertia.  And those who control its means of access also have a chilling ability to shape how that record is presented to the public, as certain citizens of the People’s Republic of China know all too well.

However one wishes to slice or dice technical issues related to the creation and management of records, we know for certain that it is impossible to construct accurate histories without accurate and faithful evidence of people’s actions.  Those who use archives can reconstruct or understand those actions only when records are maintained in an intellectually coherent fashion.  The contextual relationships between the individual documents that comprise an individual or corporate entity’s intellectual output must be preserved.  Similarly, future users of archives need to know how the records they are using are related to records produced by other records creators.  Given these facts, what types of organizations are best placed to serve as the long-term, trusted custodian of authentic, verifiable, and accurate electronic records?

It is tempting to think that the preservation of digital heritage can be left to those who provide the service of storing and disseminating the thoughts that we distill using keyboards, video cameras, or other digital devices.  But to do this would leave the records at extreme risk of loss.  At the 8th European Conference on Digital Archiving, Steve Bailey described this problem using an apt metaphor:  Imagine if we had trusted the preservation of the records left by Samuel Pepys (the 18th Century London diarist) to those who produced his communication media: The stationer who sold him his notebooks, the tanner who sold him his vellum, and the cartographer who sold him the maps he carefully annotated.

Of course, each of the businesses Pepys patronized has long since passed gently into the night.  We believe that the same fate will not await Google, Facebook, or Twitter, but even if they manage to survive, what will happen to the content stored in minor services, on contracted webhosts.  Tellingly, the terms of service for nearly every free platform or low cost webhost make absolutely no promises regarding digital preservation or even the return of content to users in even of business failure or decisions to eliminate the service.   But catastrophic business failure is hardly beyond the realm of possibility, as a shareholder in Arthur Andersen will point out, and Google’s revenue stream is highly reliant on a single source of income: Advertising sales.  Over a 50 year period, Google is as vulnerable to social or economic change as the newspaper industry, or a revolt over its privacy policies may mortally wound it.

The recent deal announced between Twitter and the Library of Congress may or may not portend a partial solution to the problem of relying on commercial entities to preserve information needed for historical research.  But let’s not kid ourselves: the Library of Congress is extremely unlikely to strike deals with every commercial entity providing social media services, much less every webhost, in the country.  Other factors will undermine the effectiveness of mass archives.  Users, quite understandably and predictably, have already begun to assert a (self-declared) right to remove content from the Library.  (The Twitter terms of service in effect since Sept. 10, 2009 provides Twitter express permission to make tweets available to anyone they choose, and the disposition of public tweets made prior to this data as well as all of the private tweets should be an interesting issue for the California Judicial system to resolve.

Even if mass ‘archiving’ of materials from millions of records creators did not face significant legal hurdles, the methods that libraries use to catalog and make information available are not well-suited to preserving the full context necessary to make individual records understandable.  To oversimplify at the risk of stereotyping: Libraries deal well with items (such as books) or consistent runs of uniform media (such as serials), archives deal well with aggregations of mixed media and with preserving the contextual information that make them understandable.  While large repositories such as the Library of Congress can use cutting edge tools to mine, re-purpose large volumes of data, most tweets cannot be understood without extensive recourse to other materials created by the same individual, such as email, blog entries, videos, or other media.

Archives and manuscript repositories based at colleges and universities have traditionally served as a trusted repository of analog records of individuals.  These repositories typically include not only the administrative records of the college or university itself—which may seem rather prosaic—but also the records or papers of prominent faculty, alumni, students, community/professional organizations, and businesses.  For example, Princeton University holds the records of the American Civil Liberties Union.  My own parent institution is currently in negotiations to acquire the archives of a defunct business of international importance and already holds records of over 60 professional and educational associations.  And repositories large and small preserve the papers of prominent and not-so prominent individuals.

Using their professional principles of provenance, sanctity of original order, collective appraisal, and active custodianship, archivists possess the conceptual tools to preserve and make accessible the raw materials of future history: email, blog postings, digital photographs, and other electronic records.  Unfortunately, most archives have made little systematic progress in identifying, preserving, and providing access to electronic records.  A forthcoming American Archivist article by Lizl Zach and and Marci Peri in concerning archivists at United States colleges and universities concludes bluntly (if a bit tortuously) that: “No comprehensive programs exist for managing e-records to use as models for the field.”  The authors continue by noting that there is a  “[g]eneral lack of interest on the part of administrators to invest a significant commitment into managing institutional records of any kind.”  The recently released program for the combined Society of American Archivists also illustrates a shocking lack of engagement with electronic records issues: only four of sixty-five sessions directly address electronic records issues.

This is not to suggest that archives are making no progress in preserving digital heritage.  One can certainly cite many examples of successful pilot programs to manage born-digital information, and some of them even capture the attention of the media, as Emory University recently did with its successful effort to emulate the complete Macintosh desktop environment from one of Salman Rushdie’s computers. My blog includes longs lists of other such programs.  But if we are honest with ourselves, many of us would have to quietly admit that too many institutions have followed policies similar to those described by a curator from Harvard University’s Houghton Library, who was quoted in the New York Times Article concerning Emory: “We don’t really have any methodology as of yet . . . .  We just store the disks in our climate-controlled stacks, and we’re hoping for some kind of universal Harvard guidelines.”

Why have most archives failed to effectively address electronic records issues?  The reasons are many, but in the end, but they typical answers are that “digital preservation is hard” and “we don’t have enough money to do it properly.”

There is a certain amount of truth to these complaints.  Many high-profile research projects, such as InterPARES, PLANETS, and NDIIP, have developed frameworks, standards, and tools that can be used to preserve and render records in a way that users can judge their authenticity and integrity.  A wide range of software development projects also exist.  Most of these implement one or more parts of a complicated framework for digital preservation known as the Open Archival Information System Reference Model. However, as William Kilbride, executive direction of the UK’s Digital Preservation Coalition noted at a recent seminar dedicated to digital preservation, “these solutions scale up a lot better than they scale down” and “why would you want to use something that was so complicated that NASA had to call in their friends to help them design it.”

Finally, the approaches that archives have taken toward analog records—arranging and describing them in detail, digitizing them, and using statistics about them as a justification for our programs—are themselves very time and resource intensive. Working with analog records has been rewarded significantly by funding agencies and donors, and metadata standards and descriptive technologies for analog records and digitization have been the focus of considerable technical work.  Making our analog holdings available to the digital age has provided unquestioned benefits to scholars, but putting so many resources into this area has probably detracted from efforts to identify, preserve and provide access to records being created today.

It may seem that archivists risk ceding the ground for digital preservation to others.  My fellow Fulbright grantee Michael Trice has been exploring the use of wiki software to document community oral histories; many people use software like Omeka for similar things, and most of them are not archivists or even librarians.  In one sense, these projects are not all that different from traditional faculty efforts that result in rich subject documentation, such as the Doris Duke Indian Oral History Archives, which is held at my home institution and includes 1960s-era oral histories for many Native American Tribes.  Certainly, many current records being generated have similar confidentiality concerns.   But, there is a critical difference: The Duke files sat in a closet for 20 years before being transferred to University Archives.  If current electronic records  projects are implemented without direct involvement of an archives, they are extremely susceptible to loss.

This was demonstrated most clearly to me by a recent incident at the University of Illinois.  Kalev Leetaru, a former student and current member of University of Illinois academic staff had developed content-rich and technologically innovative UIHistories and UIPhotos sites.  Initially developed as a proof-of-concept student project in computer technology, over the subsequent 5 years Leetaru had painstakingly added much historical content and many new access features.  The site included tens of thousands of documents that Leetaru had digitized, as well as his hunderds of pages of his own commentary and historical analysis,  and it had been highly used by faculty, staff, and members of the public.  Even the University Archives had come to rely on it for ready access to information otherwise buried in print volumes.  Leetaru had even managed to reconstruct the sites after accidental deletions and major hardware failures.

Unfortunately, aspects of the site ran afoul of the some campus units interest in “presenting a consistent University of Illinois brand,” and on October 23, 2009 Leetaru received a letter saying that the major campus technology unit which had been hosting the sites would have to remove all the content from its servers in thirty days’ time.  Despite the fact that they were being used by a diverse community, they now stood on the brink of disappearance.

Because the archives had found the sites to be invaluable as part of its information services infrastructure, University Archivist William Maher was able to a secure a reconsideration of the shutdown long enough to allow for Leetaru to move the files to different servers under a campus unit dedicated to the digital humanities.  Meanwhile, the archivist worked with the Vice-President’s office for an arrangement whereby Leetaru made edits to remove branding confusion and provide the disclaimer shown on the current site and whereby the oversight of the site was moved to the University Archives.

While this move provided a solution to the immediate problem, the long-term preservation of the site will still depend on a close affiliation of it with the University Archives.  The site could easily be lost if we are unable to maintain active custodianship of the records or if program were run out of spaceto host the site or was itself  to be eliminated (a move that is not currently anticipated, but long experience has taught the archives that campus programs come and go all the time).  It is likely that only the continued involvement of Leetaru and collaboration with the University Archives will ensure its survival as a record of his activities and as a source of reference information.

This example illustrates a simple point:  Working closely with university faculty, staff, and students, archivists must reorient archival programs toward electronic records and to appropriate a set of low/no cost tools and services to preserve digital information in a trustworthy fashion. The goal is to save the information for in a way that does no harm and allows for future re-purposing using more sophisticated methods that, after a ten to twenty year time period, may be more readily accessible to academic archives of all shapes and sizes.

The exact way in which a local archives may choose to re-think, re-conceptualize, re-construct, or re-create itself will vary and must be shaped by local context.  However, the general elements are simple: 

  • Assess available resources (e.g. staff, technology, IT support, institutional commitment, budget.)
  • Write an electronic records program statement.
  • Directly engage faculty, staff, and student groups in the electronic records program. Learn from them.
  • Develop policies and procedures concerning submission of content from creators/producers.
  • Advocate for, fund, and implement a trustworthy digital repository.
  • Develop preservation and access action plans.
  • Develop processing, preservation, and storage workflows.
  • Develop an access system for resources ingested into your trustworthy storage environment.

Through the links supplied on the Recommendations section of this blog, and others to come, archivists and those who collaborate with them can develop a blueprint for an effective e-records program, select tools, and build a trustworthy repository.  Most any institution can cobble this together with existing open-source software.

While the methods suggested are exploratory, and while I have identified gaps in software offerings, the program is achievable at most academic institutions, within an existing resource base.  The program will build trust and partnerships on campus, as well as confidence in archival staff.  Will the method work?  Over the next year, we plan to test that hypothesis at the University of Illinois Archives, the University of Dundee, and elsewhere.  Whatever the results of my particular experiment, traditional archives must be re-imagined in an act of constructive transformation.

Because I will revise and submit this article for publication elsewhere if it is not accepted for the Hacking the University volume, it is not provided under the creative commons license that applies to the majority of items on this site.


This post is © Copyright 2010, Christopher J. Prom.  All rights reserved.  This draft is made available for private study and comment in anticipation of possible future publication. You may print one copy for personal use, but do not redistribute or quote extensively without written permission of the author.

Special thanks for assistance in preparing this post go to William Maher, who supplied incisive but constructive criticism and provided information concerning an example cited near the end of the post.