Email Archiving Comes of Age
This session was composed of lightning talks about various email archiving projects, including the first NHPRC electronic records case studies focused on email archiving.
Chris Prom from the University of Illinois at Urbana-Champaign reported on the CLIR Report on Technical Approach on Email Archiving (CLIR 175). The report is available here. The purpose of the report is to document how archivists are currently preserving email as well as to frame email preservation in terms of what technology is available and how it can be used. The report includes topics such as why email matters, technical definitions, lifecycle models, tool workflows, as well as an agenda for email archiving moving forward.
Katherine O’Neil from the Manuscript Division of Library of Congress described the process of identifying email within their collections. They have located email in a number of collections, including Lieberman papers, Pelosi papers, and the American Lands Alliance collection. They use bagger tools and analyze the email using FTK to identify personal identifiable information. They have discovered a large range of file formats for emails, including the unusual pfc documents.
Kevin De Vorsey from NARA spoke on the National Archives ERM Policy guidance. The National Archives has recognized that email is important to the historical record and is currently encouraging a move from a “print and file” preservation strategy. They are working on new guidance for preservation of email. They have a noticed a trend where offices are combining the Capstone approach (identifying senior official email accounts and targeting them for preservation) with traditional records management to form a hybrid strategy.
Roger Christman from the Library of VA spoke about the Tim Kaine email collection. The collection includes 1.3 million records. This project focus on access as preservation as the end goal. They provide online access to emails by converting emails to pdfs, with attachments, uploading them to the content management system, and making them keyword searchable. They are currently focused on documenting how the collection is being used to provide examples of how email is critical to scholarship.
Katherine Martinez from the Trisha Brown Dance Company presented on two case studies of museum organizations, looking at records created during the exhibition process, where email was considered a problem record. Katherine focused on the role of appraisal, including defining routine correspondence, as well as ensuring that appraisal was not limited to executive accounts. She also created a survey and used social network analysis to set parameters for email collecting.
Camille Tyndall Watson and Jeremy Gibson from the State Archives of North Carolina spoke on the TOMES project. This grant project uses a Capstone approach to collecting state government email in North Carolina. They have created a series of forms to identify who has archival email, based on position, function and projects. An individual’s position number is then flagged in the HR system and placed on legal hold by tagging accounts. They also discussed the TOMES tool, which they hope can be used by both power users or those with fewer resources to process email.
Wendy Marcus Gagel from Harvard University spoke on the EAS tool, which would be an open source tool for email archiving. Harvard currently uses EASi to manage email processing and automate the process for transferring email to a repository. The current EASi tool is integrated with Harvard’s other home-grown systems, so they are working on providing an open source version for other institutions.
Glynn Edwards from Stanford University provided an update on EPadd. EPadd is currently in its last quarter of funding. The goal is make EPadd a more reliable tool and increase functionality. The latest release in July includes new import and export options. EPadd has also modeled integration with Archivematica via a partnership with New York Public Library. The EPadd project is looking for future funding.
Brent M. West from the University of Illinois spoke on email automation. He implemented a Capstone model on the governor’s office email. He found that the best opportunity to collect email was as soon as possible after individual leaves. Appraisal was a key element of the collecting process, including using concept and term analysis to review email. They currently provide access to emails via Outlook in an offline terminal.
Joel Simpson from Artefactual Systems spoke on emerging trends for emails. He described the importance of reviewing other industry tools for their email capabilities, including tools that take into account header fields to prove authenticity of emails. E-discovery tools could also be of use to future email archiving and processing, but many are out of the price range of organizations.