Beth Cron and I had a good conversation with Seth Shaw and Jeremy Gibson about open source records management tools. If you didn’t have a chance to join us last week, you can still view it at https://www.youtube.com/watch?v=1GRQBUjtOT8.
Here are some of the highlights:
- Open source tools can be very valuable when they have a strong community around them. This support leads to active development, which makes for a sustainable tool. Being able to see other people’s work also provides more entry points to solving your own problem.
- The potential downside to open source tools is that they tend to have less documentation, and there’s the potential for projects to be abandoned.
- Seth mentioned an interview on The SIgnal about using open source tools in cultural heritage institutions. You can find it at https://blogs.loc.gov/thesignal/2013/01/when-is-open-source-software-the-right-choice-for-cultural-heritage-organizations-an-interview-with-peter-murray/.
- Seth has a report pending publication as part of OCLC’s Demystifying Born Digital series (http://www.oclc.org/research/publications/library/born-digital-reports.html).
- Jeremy pointed out that one of the benefits of open source tools is that you can more easily find tools that do one thing very well — and then stitch those solutions together to accomplish all of your RM needs.
- In answer to a query about extracting metadata from media files, Jeremy pointed to MediaInfo and JHOVE.
- One of the particular gaps identified in existing open source tools is one to handle redactions.
- Before adopting a records management tool, it’s important to document your functional requirements and your organizational requirements (e.g., budget, IT support). Only then can you make sure you’re choosing the right tool for your purposes.
The Records Management Section is planning two upcoming Google Hangouts!
The first is on Thursday, February 2 at noon Eastern on Open Source Tools for Records Management. We will be joined by Seth Shaw from Clayton State University, who will discuss the Data Accessioner, and Jeremy Gibson from the State Archives of North Carolina, who will talk about renameIT as well as setting up an AXAEM system. They will discuss the pros and cons of using open source tools — both from the developer and the user perspectives — and provide some advice about implementing them.
The second is on Wednesday, February 8 at noon Eastern and will be on the topic of police body camera footage. Snowden Becker from the UCLA Department of Information Studies will discuss the following questions: Where is the point of intersection between the evidentiary *value* of records, as an archival concept, and the records that actually constitute criminal evidence? What does the legal duty to preserve have to do with the material preservation of records in new formats like digital video, or evidentiary traces created by Internet of Things-enabled (IOT) devices? This discussion will touch on some of the parallels between records management and evidence management in public agencies. We’ll explore how police body-worn camera programs are presenting new challenges to the very definition of public records, and shining light on the widely varying practices of records creation, collection, and retention in the criminal justice system.
Be sure to tune in live to ask questions or watch later at your convenience. You can view the Open Source Hangout here and the police body camera footage Hangout here .
For both Hangouts, we’ll be accepting questions for our speakers from you. If you have a question or topic for discussion please leave it as a comment here or use the #saarmrt hashtag on Twitter. We will also monitor the comments on the YouTube live streaming page.
To round out this year’s look at open source tools, I want to provide an overview that can serve as a primer to the Hangout we intend to host early in 2017. Open source software is “software with source code that anyone can inspect, modify, and enhance.” Opensource.com goes on to provide a more expansive explanation of the purpose of open source software:
“Open source projects, products, or initiatives embrace and celebrate principles of open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community-oriented development.”
As I noted in previous posts, NARA published a report in 2015 entitled “Open Source Tools for Records Management.” This report points to the generally free cost of open source software and the “very robust user and developer communities that are actively working to report bugs and improve the tools” as advantages of its use. However, this report also acknowledges (1) care must be taken to guarantee adequate security when deploying open source software and (2) customization may be required — which will probably also necessitate time and IT know-how.
Open source software differs from closed source, or proprietary, software because its code is open to all to see and change. Although open source software is often times provided free of charge, it is not the same as freeware, which may be closed source software. Notable open source technologies include the Linux operating system and the Apache web server application. The Linux OS is a good example of the practice where the software is open source but the support comes with a price — such as that provided by Red Hat. Opensource.com lists a number of reasons why developers prefer open source software:
- control over what the software does
- training by being able to see the source code
- greater security due to quicker updates to address vulnerabilities
- stability even after the original creators discontinue their work on the software
In August 2016, Wired released an article entitled “Open Source Won. So Now What?” This article points to the first official federal source code policy, which requires government agencies to release at least 20% of any new code as open source software. It also acknowledges that open source development can be challenging due to lack of funding and because it’s hard to break into the field, which is increasingly being dominated by big companies.
If you’re interested in open source software, stay tuned for more blog posts and our Hangout in 2017.
Today I offer my third post in a series based on NARA’s 2015 report “Open Source Tools for Records Management.” I investigated MUSE (Memories USing Email), which was developed at Stanford University. It is available for use in a Windows, Mac, or Linux environment. I conducted my tests using Windows.
This program is a visualization tool for analyzing emails. It is still in active development, and it currently incorporates six tools:
- tabulates topics
- charts sentiments
- tracks the ingress and egress of email correspondents and groups people who “go together” based on their receipt of messages
- allows you to browse attachments on a Piclens 3D photo wall
- offers the possibility of personalizing web pages by highlighting terms also found in your email (requires the use of a browser extension)
- creates crossword puzzles based on your email archive
Once you download the executable file from the above site, the program runs locally on your computer. Muse can be deployed on a static archive of email messages (e.g., an mbox file) or it can fetch email from online accounts for which you have the email address, password, and server settings. It defaults to analyzing Sent mail, based on the principle that those messages more accurately reflect the topics and people with which the account owner is most engaged, but you can also include additional folders. You can then browse all messages in the embedded viewer — without having to open each message individually — or you can use any of the tools listed above.
The sentiment analyzer using natural language processing and a built-in lexicon, but it can be customized by the user to identify desired terms (see Edit Lexicon highlighted above to access the screen below).
According to their tip sheet for journalists, MUSE “was originally meant for people to browse their own long-term email archives. We have now started adapting it for journalists, archivists and researchers.” Due to the ease of use of this lightweight tool, this could be an easy way for repositories to provide an email analysis tool to researchers. This same tip sheet defines the “sweet spot” for the software as archives with about 50,000 messages.
If you’re interested in learning more about MUSE, a Ph.D. dissertation and a number of papers are available here. There’s also a video that argues for the value of analyzing personal digital archives. This project dovetails with the work being done at Stanford on ePADD — check out our Hangout for more information on that project.
This records management session featured participation by several RMRT steering committee members, with Alex Toner (University of Pittsburgh) moderating the session and Hillary Gatlin presenting.
Anita Vannucci of Emory University emphasized the importance of knowing open/public records laws. She suggested prioritizing work with the people who want to work with you – and then leverage this work to advocate for additional resources. She has found it useful to look to her state archives for resources that can be borrowed or adapted and to find out what peer institutions are doing.
Donna McCrea from the University of Montana looked to the American Association of Registrars and Admission Officers Retention, Disposal, and Archive of Student Records (2013) for guidance. They created a RRS upon the directive of the Commissioner of Higher Education.
Hillary Gatlin from Michigan State University focused on records destruction. At MSU, the Director of Archives must approve records destructions, so they’ve developed a form that can be seen here.
Daniel Noonan from the Ohio State University reported on their general schedule and department-specific schedules. The Inter-University Council of Ohio developed a new schedule in 1992 after the universities were “liberated” from the state records management system.
Johna Von Behrens from Stephen F. Austin State University said an internal audit is a good means of identifying the risks of poor record management:
- records not appropriately classified and identified
- recordkeeping process not effective
- records (paper and electronic) not adequately safeguarded
- inadequate record retention management
- process not communicated
Mary McRobinson reported that Willamette University began a records management program in 2010, and because their archives staff had no bandwidth for this additional work, they brought in outside consultants to devise retention and disposition schedules. Their process was as follows:
- set up steering committee with stakeholders
- sent out RFP
- consultants toured campus, interviewed departments, developed retention and disposition schedules
- consultants also produced guidance report – current situation, implementation, etc.
- RM program is introduced at new employee orientation
- individual training of departmental liaisons is coordinated by RM program
Virginia Hunt from the Harvard University Archives said their RM program was established in 1995 by a corporation vote. They ultimately combined collection development and RM services. They’ve found web archiving to be an effective form of outreach.
This session featured a variety of archivists discussing the necessity of having a good working relationship with legal counsel.
Kathleen Roe, former SAA president and retired from the New York State Archives, noted two trends — an increasing professionalization of archives and an increasingly litigious society. She asserted all archivists need to know about FOIA, the PATRIOT ACT, state public records laws, HIPAA, FERPA, and IP laws. She counseled that ignorance of the law will not stand up in court – even if it’s how your predecessors did it! She provided several words of wisdom:
- “archivists need to be proactive, not reactive”
- “everything’s an advocacy opportunity”
Roger Christman of the Library of Virginia explained that their processing guidelines haven’t been vetted by an attorney, so they err on the side of caution, and many items are restricted that probably only need to be redacted.
Samantha Cross works at CallisonRTKL, Inc. Their archives has been housed in IT, Operations, and now resides in Legal. She contended that it’s vital to be assertive and to have an advocate. She suggested the importance of helping people understand that records management is a liability and risk management issue.
Javier Garza work at the Historical Resources Center, University of Texas MD Anderson (MDA) Cancer Center. They have conducted oral histories with MDA administrators, doctors, and nurses – some of whom were also patients. So they created a HIPAA decision tree to determine access to these oral histories. He clarified that any type of health information is protected if that person is a patient of MDA – even if MDA didn’t treat that particular issue.
Christina Zamon from Emerson College explained the copyright complications that arose when a musician/humorist wanted to donate works and make them freely available.
This session featured archivists from the American Heritage Center (AHC) at the University of Wyoming discussing their efforts to provide users access to born-digital materials.
Irlanda Jacinto described the AHC as an “access-driven institution” – fast, open, and responsive. They create a catalog record and trunk EAD, which make the records discoverable in the catalog.
Amanda Stow reported that the digital files aren’t indexed. If patrons want to download materials, they must purchase a flash drive from the AHC. The patron agreement specifies that users are responsible for abiding by any copyright restrictions.
Tyler Cline described their process of ingesting a backlog of 1.5TB from physical material. They developed an home-grown system because the vendor solutions they investigated seemed either incomplete or too expensive. They have a dark archive and also produce access copies. The in-house computer used by patrons is locked down with read-only access. The system requires active intervention by an archivist to map user access to particular folders — in the survey, patrons reported resentment of this process. Users also resented the limitation of only being able to access born-digital records in the reading room. In response to the survey, the AHC moving forward plans to move restricted files up one level in the file structure so an archivist doesn’t have to monitor access within a folder. He also contended that patrons need to be educated that access won’t be a Google-like search because the files aren’t indexed — instead, access looks more like a database, with a finding aid as an access point.