Two Case Studies of Web Archiving for Records Management

This week, I’m building off of two Schedule posts from last fall: Meg on websites as records and Matthew on social media content as records, by sharing two quick case studies in web archiving from my current work.

For the past year, I’ve been involved in a project to increase historic knowledge and current documentation of student life at my institution. This spring, that project has focused on establishing relationships with student groups, analyzing their functions, activities and records, then scheduling immediate and future transfers to the archives. In other words, records management.

I have found that most of the student groups I worked with did not generate traditional record types such as constitutions, meeting minutes or even newsletters. Instead, the recorded evidence of their activities — meetings, governance, publicity, events, etc. — exists fairly consistently in a complementary suite of traditional websites, social media sites and Google Drives. (Capturing records from Google Drive is beyond the scope of this post, but I wanted to mention it because of its perfect ubiquity among the student groups I worked with.)

While I expected websites and social media to be important to how the student groups operated, I was genuinely surprised by the extent to which they represented the only recorded evidence of basic facts about the groups, such as who were their elected leaders. In one memorable case, a group’s current leaders were attempting to create a list of their former leaders, but had nowhere to turn to find that information. Fortunately, the group had been publishing a list of its current leaders on its website for some years, and copies of those old websites were still available in the Internet Archive; they were thrilled when I shared this discovery with them.

So for nearly every group that I worked with this spring, I ended up setting up a web collection in Archive-It  that will continue to crawl their multiple websites at regular intervals. However, with the combined vagaries of the undergraduate population and the web, I’ll be interested to see exactly how long these schedules remain functional, and how soon I’ll have to revisit this strategy.

Simultaneously with this work with student groups, I’ve continued to do records management consultations with university units. In my investigation and interviews with the staff of one, a center from which the archives has never routinely received records, I realized that an immediate crawl of their website would be of particular value.

In this case, I identified a number of reports and other document types that I wanted to schedule for permanent retention, and when I asked where they were currently maintained, the same answer kept coming back: “Those are posted to the website.” Since the center’s website had not been redesigned or relaunched for the internet equivalent of an eon, I quickly realized that the transfer of previous years’ versions of these documents would be most easily accomplished by a thorough capture of their web pages and online documents.

However, the mature website is a two-edged sword: it is currently chock-full of important documentation, but it is also nearing the end of its life cycle. While I will continue to capture new copies of the website in the future, unlike with the student groups, since the records I want to capture exist as documents independent of a web instance, I am scheduling them for direct transfer to the archives in their native formats — PDF, .doc, etc. — rather than relying on the capture of a website that is likely to undergo major revisions.

In short, my recent experiences have confirmed some of the fundamental things that Meg and Matthew discussed in their earlier posts — websites, including social media sites, can be records and should be scheduled as such. At the same time, though, I am wary of creating too many schedules that depend on the capture of websites for permanent retention of records when other options exist. Website structures, purposes and even URLs are all subject change at a much faster rate than I hope to have to revise my retention schedules.



One thought on “Two Case Studies of Web Archiving for Records Management

  1. Pingback: Records Management and Web Archiving: RMRT, WebArch join forces for next Virtual Hangout | The Schedule

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s