Author: Steven Gentry
“The speakers consider innovative collections of born-digital materials from both the fringe and mainstream related to current events that contain controversial or sensitive materials. They address challenges related to collection scope, ethics of collection and access, liability, contexts for the collection, appraisal, access, technology, and staff safety. This session is relevant to any archivist who is considering web archiving or social media collection of current events.”
Recap: Recap Introduction
Session 702 focused on web archiving efforts related to content that simultaneously documents recent events and–by its very nature–could be considered sensitive and/or controversial. In addition to discussing their specific case studies, the three panelists–Jennifer Weintraub (Librarian/Archivist for Digital Initiatives, Schlesinger Library, Radcliffe Institute, Harvard University), Jane Kelly (Web Archiving Assistant, #metoo Digital Media Collection, Schlesinger Library, Radcliffe Institute, Harvard University), and Samantha Abrams (Web Resources Collection Librarian, Ivy Plus Libraries Confederation)–also spoke of issues and questions that had to be addressed when collecting these challenging web resources.
Part 1: Documenting Current Events and Controversial Topics (Jennifer Weintraub)
Jennifer Weintraub began the session by introducing the panelists and highlighting some of its major themes and points:
- “Controversial material”: Weintraub defined “controversial material” broadly–”it depends”– and noted that the Schlesinger Library, along with many other institutions, already collect this kind of content. She further emphasized the ubiquity of this kind of content in archivists’ professional lives.
- “Current events vs. regular collections”: Weintraub further emphasized that archivists who may work with digital collections–especially those that have sensitive/controversial content–must act quickly and intelligently to preserve these resources. Archivists may need to conduct research, deploy novel tools, and implement imperfect, yet efficient, solutions–while also forming collaborative relationships with fellow professionals who are also engaged with this kind of work (e.g. other archivists and information technology staff).
- “Ethics”: Due to its nature, ethical concerns frequently arise when working with these archives. At the Schlesinger Library, an ethics statement guides archivists’ endeavors as they collect content related to the #metoo movement.
- “Emotion”: In addition to various ethical considerations, collecting controversial web archives can be an emotionally charged, difficult endeavor. Although these kinds of projects can be difficult to accomplish, Weintraub emphasized that emotions can, ultimately, help us better understand and embrace this kind of work.
- “Some Caveats”: All of the panelists come from well-funded institutions, for example, and the panel itself only discusses two web archiving efforts. Therefore, the panel cannot claim to be comprehensively describing these kinds of web archiving efforts.
Weintraub then introduced the Me Too movement and the #metoo Digital Media Collection project, “a large scale project to comprehensively document the #metoo movement and the accompanying political, legal, and social battles”. She highlighted the necessity of collecting this exceptionally ephemeral, at-risk content owned by external parties–especially given the focus of the Schlesinger Library and the overall importance of the Me Too Movement–while also crediting Documenting the Now as one of the project’s major influencers. Other topics briefly addressed include a description of the initial steps involved in this project, such as developing a grant and forming relationships with other Harvard University staff and faculty members; the Schlesinger Library’s previous experiences doing web archiving work; and the project’s collecting scope.
Part 2: Jane Kelly: Collecting Material about the #metoo Movement
Jane Kelly continued the conversation about the #metoo Digital Media Collection project by discussing the effort in greater detail. Kelly addressed ethical considerations first and she noted that their research–which produced an ethics statement as well as a significant bibliography–produced in the following ideas and questions:
- Legality and ethics are two very separate ideas. Therefore, Kelly emphasized approaching ethics from the position of individuals creating content.
- “Contextual approach to privacy,” which includes questions such as, “What do content creators believe about their right to privacy on the web? How does their personal context shape their understanding and expectations of privacy and anonymity?”
- “Social network theory of privacy”, which emphasizes that users’ “expectations regarding their privacy [is]…based on the number, depth, and breadth of connections that users have on the web”.
Next discussed were the tools and workflows used to capture relevant forms of content–all of which can be learned and implemented by archivists, as Kelly emphatically articulated. These tools included:
- Web content: Archive-It (the primary tool employed) and Webrecorder (content captured via Webrecorder was uploaded to Archive-It). Nearly 900 seeds have been collected, the bulk of which are single page crawls that have been crawled only once.
- Twitter content: Social Feed Manager, as well as Twitter’s Historical PowerTrack (to capture older tweets). Ultimately, about 19 million tweets will be archived as part of this project.
- News articles: Media Cloud. Ideally, Archive-It would be used to de-duplicate the approximately 384,000 resources captured via Media Cloud.
After discussing tools, Kelly addressed the kinds of questions that guided her collecting efforts–especially given the somewhat limited resources that were devoted to the project. Some of these questions included the following:
- “Whose voices are represented?”: The content of those whose voices were less represented in this movement (e.g. non-white celebrities) could be considered more important to capture.
- “Is it technically possible (and reasonably easy) to capture?”: Content that proved more difficult–especially if it was deemed less important or already present in other collections–could be disregarded.
- “Is it valuable to have this content in this collection, even if it’s also captured elsewhere?”: Vitally important content that could be quickly obtained could be captured, even if such efforts were duplicative. This helped researchers understand the captured content, meaning that this non-unique content has value.
- “How much context do we need to provide and do we have the resources…to do so?”: How much additional, related content should be captured in order for researchers to understand the web content? And is this additional effort worth the cost?
Kelly also briefly discussed an experimental workflow that uses Zotero to create descriptive metadata for web archives. In essence, metadata about various web resources was captured in a Zotero library and exported into a CSV spreadsheet where it could be cleaned up, mapped to Archive-It’s Dublin core metadata elements, and uploaded to Archive-It.
In her conclusion, Kelly addressed the consequences of working with these controversial and/or sensitive materials. In addition to advocating that the archivist take frequent, necessary breaks, she mused on whether working with these resources impacts our professional capabilities (e.g. via desensitizing the archivist who is constantly exposed to this content). Ultimately, Kelly argued that archivists should seek out “empowerment through empathy”–or positive examples– while also asking two key questions as they work on these projects:
- “Does our work empower individuals and communities?”
- “How can we advocate for changing practices to ensure that this is possible?”
Part 3: Samantha Abrams: Ivy Plus Libraries Partnership Framework for Collection of Web Archives
For her portion, Samantha Abrams focused on the Ivy Plus Confederation, several of its web archiving projects, and challenges associated with those projects. Abrams began her portion by introducing Confederation–“a partnership between thirteen leading academic research libraries…that collectively provide access to a rich and unique record of human thought and creativity through resource sharing and collaboration” and its Program, “a collaborative collection development effort to build curated, thematic collections of freely available, but at-risk, web content in order to support research at participating libraries and beyond”. She also briefly described the selection process for these various web projects–how, for example, subject specialists and information professionals from the Confederation’s many institutions come together to determine which web archiving projects will be undertaken. She also highlighted that projects brought forward by an individual from one institution must have support from someone affiliated with another institution–which, as noted later, can be a blessing and a curse.
After briefly introducing and discussing the Web Collecting Program’s evolution, Abrams discussed several web archives attempted or completed as part of this program (their descriptions from the Ivy Plus Confederation’s website are listed below):
- State Elections Web Archive: “Campaign websites of declared candidates running for state elective offices in 2018 in California, Connecticut, Illinois, Maryland, Massachusetts, New Hampshire, New York, North Carolina, Pennsylvania, and Rhode Island”.
- Brazilian Presidential Transition Web Archive: “Brazilian government websites in the areas of human rights, the environment, LGBTQ issues, and culture, for the period following the election of Jair Bolsonaro as president of Brazil on October 28, 2018, up to his inauguration of January 1, 2019”.
- Web content relating to immigrants trying to acquire political asylum in the United States: Ultimately, this collection was not created, as discussed later in the session.
- Extreme Right Movements in Europe Web Archive: “Documents the rise of extreme right movements in Europe in the twenty-first century. Access is restricted to on-campus use within the Ivy Plus Libraries Confederation”.
Key questions that Abrams found herself addressing while working on these projects included the following:
- Speed (“How fast is too fast?”): Web content–as exemplified by the State Elections Web Archive–can rapidly change, even as the organizations that comprise the Ivy Plus Confederation more slowly discuss enacting and supporting various projects. This prompted Abrams to consider how to implement imperfect solutions that most effectively and efficiently collect relevant content.
- Matter (“Who matters most?”): Although documenting recent events is vital, Abrams emphasized the necessity of considering the impact of archival efforts prior to engaging in a project. This question ultimately resulted in the rejection of one project related to undocumented immigrants seeking asylum, as there was concern that other institutions (e.g. the police) could use this web archive to the detriment of those individuals featured in it. Questions also arose concerning protecting staff members associated with the project as well as ethically acquiring content from creators.
- Context (“What’s important contextually?”): Abrams noted the importance of questioning the home of these particular archives. She asked, for example, should the Ivy Plus Confederation host these web resources–or would they be more useful/understandable if they were kept at another institution, such as the Southern Poverty Law Center?
- Access (“How do you provide access?”): Access to these web archives was occasionally a fraught question–for example, should the Ivy Plus Confederation provide unfettered access to the Extreme Right Movements in Europe Web Archive, for example, and what are the consequences of doing so? Here, Abrams argued that archivists should draw upon their experience restricting access to physical collections to guide their decisions with restricting access to web archives.
Abrams concluded her portion of the session by asking critical questions about web archiving efforts and, importantly, if it supports our communities.
Part 4: Questions for Panelists
In this final section, the panelists opened up the floor to questions and comments from audience members. Some of these questions are noted below:
- How will the #metoo Digital Media Collection project be made available–and when?Response: Ideally by Fall 2019, although it depends on the kind of material (i.e. the web archives will likely be made available on time, while the availability of the Twitter data depends on when the data requested via Historical PowerTrack becomes accessible). Ultimately, web content will be made available via Archive-It and Twitter data will be made available via Harvard’s Dataverse.
- Has any of the panelists explored at commercial content moderation and its impact? Response: Not really. However, Kelly referenced an article recently published in the Journal of Contemporary Archival Studies concerning trauma. Response: An audience member also recommended that folks interested in this topic reach out to Wendy Duff, at the University of Toronto, who is currently researching this topic.
- What legal advice did Harvard’s counsel provide, especially concerning making Twitter feeds accessible as part of the #metoo Digital Media Collection project?Response: Ultimately, the archivists will hold to Twitter’s terms of service. Response: Additionally, Harvard’s counsel said there should be no issue concerning the collection of copyrighted content, since it will take researchers quite a while to find this data. This means that there will be no negative financial impact associated this collecting effort.