LOCKSS

Lunchtime listen: not your grandfather's web anymore

Not Your Grandfather's Web Any More, a project briefing from the Coalition for Networked Information (CNI) spring 2013 member meeting by David S.H. Rosenthal of LOCKSS and Kris Carpenter Negulescu of the Internet Archive, is now available on CNI's video channels:

YouTube: http://youtu.be/uIqU2Cr2Kjs
Vimeo: http://vimeo.com/66175352

What are the practical and theoretical archiving problems posed by the newer parts of the Web, like social media, scientific workflows and Web services? How can the challenges of these latest developments be met, if at all? This presentation reports on the results of a workshop held at the Library of Congress under the auspices of the International Internet Preservation Consortium, where practitioners of Web archiving reviewed these questions. More information about this talk, including presentation slides, is available on the CNI site.

LOCKSS and CLOCKSS: Interview

Here's a short, informative interview with Vicky Reich, director of the LOCKSS programme at Stanford University Libraries, and Randy Kiefer, executive director of the CLOCKSS archive:

Excerpts:

VR: If you don’t preserve digital content then it won’t exist. Most of society’s culture and commercial assets are now digital but, generally, the move from print to electronic is about access rather than preservation....

VR: The web as a publishing platform enables many things never envisaged in the print world. The web started with a document model, then evolved to include dynamic elements, such as advertisements and embedded videos. But first with AJAX and now with HTML5, the web is becoming a networked operating system inside the browser. It is no longer enough to parse content collected from the web to find the links and follow them; the content must be executed to discover the web resources from which it is composed. Some of these resources are web services, such as Google Maps. Preserving executable content and the services on which it depends is a major challenge that the LOCKSS programme is working to address.

The best way to preserve digital content...

Here is an interesting article that examines the use of criminal digital forensic tools to discover and repair corrupted digital information in digital archives, but there is another story here as well. Although the title doesn't tell you this, Fox actually looks at two alternatives for digital preservation: digital forensics and what he calls "the buddy system."

Fox describes the buddy system this way: "[W]hen more than one system is responsible for maintaining the integrity of any given digital object. If each system in question has a copy of the object, and they are verifying the integrity of that object against the objects that their "peers" possess, there is a much higher probability when they agree that the integrity of the object is intact. This is a "digital buddy system" of sorts, because each peer helps the other peers in it's network maintain the integrity of commonly held digital objects. This is the principle behind the LOCKSS electronic resource preservation system (LOCKSS, n.d.; Rosenthal and Reich, 2000), which is a peer-to-peer preservation system now in wide use, and developed and maintained by Standford University."

He notes further:

Studies over the last decade have indicated that digital preservation is most successful when the information "is best preserved by replicating it at multiple archives run by autonomous organizations".... These concepts have been in place for almost ten years, but it has only been in the last four-to-five years that libraries have attempted to preserve anything beyond e-journal content using P2P network systems. [emphasis added]

 

LOCKSS-USDOCS at Best Practices Exchange

I just got back from Best Practices Exchange 2010 (check out the growing list of available presentations and the twitter back channel!). It was a really solid conference -- a healthy mix of archivists, documents and other librarians, and technologists having project-oriented presentations with a healthy dose of discussion. The cherry on top was the engaging keynote by the David Ferriero, the Archivist of the US (AOTUS) (here's a good summary of AOTUS' talk).

I was on a panel with Arlene Weible from OR State Library (Arlene gave a great talk on RAT, OSL's tool for collecting state documents -- I hope she posts her slides soon!) and presented about LOCKSS-USDOCS, the distributed documents preservation project. Take a look at the slides. We're looking for other participant libraries so email me if your library is interested (jrjacobs AT stanford DOT edu).

David Rosenthal: Stepping Twice Into The Same River

Last month, David Rosenthal, chief scientist on the LOCKSS Project, gave the keynote address entitled Stepping Twice Into The Same River to the ACM/IEEE Joint Conference on Digital Libraries (JCDL) and the annual International Conference on Asia-Pacific Digital Libraries (ICADL) (or just ICDL/ICADL!) in Queensland, Australia. It was wide-ranging, thoughtful and provocative -- in short everything you'd want in a keynote to a major international digital library conference.

David hit on publishers and the publishing industry and practices, scholarly communication, digital preservation, the intersection between technology and economics and the current state and future of libraries. He makes a great argument that the upheaval and disruption currently affecting the 3 parallel fields of publishing, libraries, and archives (what he terms "technological and economic discontinuity") creates the perfect opportunity for radical technological change toward a collaborative archival academic cloud in order to define the future of information access and preservation (at least in terms of universities and scholarly communication) in beneficial and long-term sustainable ways.

Here are some main points that I gleaned from David's presentation:

  1. publishers are in a similar boat to news organizations and have sacrificed long-term viability for short term economic gain -- and that's going to ultimately destroy them;
  2. libraries and archives need to focus their preservation goals on dynamic services rather than the static content:

    "...it's less about what we are preserving and more about how preserved information is accessed. Less about HTML and other formats, and more about HTTP and other protocols. The reason is that static information is a degenerate case of dynamic information; a system designed for dynamic information can easily handle static information. The converse isn't true."

  3. distributed digital preservation and archives offer the more economically and technologically sound opportunities in the long run;
  4. data preservation will take steady long-term funding;
  5. since ingest is a major cost for any digital preservation system, universities need to start seeing their Web space/infrastructure in terms of academic clouds rather than leasing from commercial cloud companies like Amazon's Elastic Compute Cloud (Amazon EC2):

    "Unless something dramatic happens, scholars who want to publish services wrapped around their, or other people's, data will take the path of least resistance and use Amazon's services. Miss a credit card payment, your data and service are history. Worse, do we really want to end up with Amazon owning the world's science and culture?"...

    ...What Universities get for the extra cost is the permanence they need. The permanence comes from the fact that the University already has its hands on the data and the services in which it is wrapped, instantiated in highly robust and preservable hardware. Thus, no ingest costs and very low preservation costs. With the model of Amazon and a separate archiving service, as well as paying Amazon, Universities have to pay the archiving service, and pay the ingest costs. When these extra costs are taken in to account, because the ingest costs dominate, it is likely that Amazon would be more expensive.

I highly recommend that folks read David's keynote at least twice. there are a lot of pearls of wisdom in there. I think he makes a compelling case for a viable digital future for scholarly communication, one in which libraries and archives can play a vital role.

This is BIG: GPO + LOCKSS (update)

Last week, James made a modest announcement of the biggest development in digital deposit in decades.

This means that GPO is assisting the LOCKSS-USDOCS project in preserving content harvested from fdsys.gov. That means we are developing a geographically distributed network of digital archives. There are already 18 libraries participating, including 4 regionals. As James pointed out, this "replicates key aspects of the FDLP in the digital environment and furthers the concept of 'digital deposit,' an essential component of the digital FDLP."

One indicator of the importance of this project in the world of digital preservation is that the Association of Computing Machinery's technology newsletter, ACM TechNews, lists the project today.

Although LOCKSS-USDOCS is still essentially a backup of FDSYS (the content only gets made accessible if the live content goes away), this is still an enormous step in the right direction for digital preservation, both technically and politically. It was fairly recently that GPO seemed to want nothing to do with LOCKSS (See: GPO LOCKSS report: Why LOCKSS vs. FDsys? and GPO, LOCKSS, IP Authentication, and the future of FDLP -- more clarification needed.) Now, GPO is actively collaborating with depository libraries by putting LOCKSS permission statements throughout the FDsys.gov site in order for LOCKSS-USDOCS to harvest GPO content. This is a huge change in GPO's attitude from 3 years ago!

Now that we are beginning to have a distributed digital backup of FDsys, we can begin to look forward to the next steps of digital deposit in which documents and data will be deposited into live digital library collections for active retrieval and use.

Congratulations go to James, Stanford, LOCKSS, and GPO!!

GPO joins LOCKSS: digital deposit a reality

According to yesterday's press release, GPO has joined the LOCKSS alliance! The Stanford News Service also wrote a story about this historic event, complete with a goofy picture of yours truly :-)

But what the GPO press release didn't explain is that, as part of GPO's participation in the LOCKSS Alliance, GPO will assist the LOCKSS-USDOCS project (which I'm organizing) in preserving content harvested from fdsys.gov in a geographically distributed network of digital archives. GPO has put LOCKSS permission statements (for example here, and here and here) throughout the FDsys.gov site in order for LOCKSS-USDOCS to harvest GPO content. LOCKSS-USDOCS -- which is 18 libraries strong (including 4 regionals!) and growing -- replicates key aspects of the FDLP in the digital environment and furthers the concept of "digital deposit," an essential component of the digital FDLP.

We're actively looking for other libraries to participate in the project, especially regionals. Together we can provide an essential digital preservation piece to the FDLP. Please contact me (jrjacobs AT stanford DOT edu) with questions or interest.

--That is all.

MetaArchive publishes guide to distributed digital preservation

Please check out the new book published by the MetaArchive Cooperative called A Guide to Distributed Digital Preservation. It's both timely and handy.

[Full disclosure: the book is primarily about LOCKSS and mentions specifically the project that I'm working on LOCKSS-USDOCS, FGI and I receive no compensation from the sales of the book.]

Announcement: publication of A Guide to Distributed Digital Preservation

Authored by members of the MetaArchive Cooperative, A Guide to Distributed Digital Preservation is the first of a series of volumes from the Educopia Institute describing successful collaborative strategies and articulating specific new models that may help cultural memory organizations work together for their mutual benefit.

This volume is devoted to the broad topic of distributed digital preservation, a still-emerging field of practice for the cultural memory arena. Replication and distribution hold out the promise of indefinite preservation of materials without degradation, but establishing effective organizational and technical processes to enable this form of digital preservation is daunting. Institutions need practical examples of how this task can be accomplished in manageable, low-cost ways.

This guide is written with a broad audience in mind that includes librarians, archivists, scholars, curators, technologists, lawyers, and administrators. Readers may use this guide to gain both a philosophical and practical understanding of the emerging field of distributed digital preservation, including how to establish or join a network.

Readers may access A Guide to Distributed Digital Preservation as a freely downloadable pdf and/or as a print publication for purchase. Please visit http://www.metaarchive.org/GDDP to download or order the book.

******

The MetaArchive Cooperative provides low-cost, high-impact preservation services to help ensure the long-term accessibility of the digital assets of universities, libraries, museums, and other cultural memory organizations. In addition to preserving members' digital content in a distributed digital preservation network, the Cooperative also offers consulting and education services to institutions that seek training in digital preservation planning, policy creation, and implementation, including setting up and running Private LOCKSS Networks (http://www.lockss.org).

For more information, please contact Program Manager Katherine Skinner (katherine.skinner@metaarchive.org).

Lunchtime Listen: How are we ensuring the longevity of digital documents?

Please check out the spring 2009 plenary at Coalition for Networked Information (CNI) by David Rosenthal, chief scientist of the LOCKSS program. He presents a "contrarian view" of digital preservation. The issues he raises are definitely important to think about for those of us working to preserve digital govt information/documents for the long term.

How Are We Ensuring the Longevity of Digital Documents? from CNI Video Editor on Vimeo.

Distributed Globally, Collected Locally: LOCKSS for Digital Government Information

Since Daniel mentioned yesterday about LOCKSS and digital deposit as recession insurance (which BTW is a GREAT oogly hook for open govt!!) I thought I'd mention a hot new article that Daniel and I wrote for the February 2009 issue of Against the Grain about the new U.S. Government Documents Private LOCKSS Network (citation below). The issue has not officially been released, but we got permission to post to FGI as a preprint.

The article describes the LOCKSS model of digital preservation and why that model is beneficial to apply to the realm of digital government information. We describe Carl Malamud's herculean efforts toward better access to government information; Then talk more specifically about the new USDOCS Private LOCKSS Network (USDocsPLN) using those documents harvested by Malamud. The paper concludes with a call to action.

Let us know what you think. and by all means, help us move forward with the USDocs network by participating. LOCKSS is great recession insurance and SO much more!

Citation: Distributed Globally, Collected Locally: LOCKSS for Digital Government Information. Daniel Cornwall and James R. Jacobs. Against the Grain, 21(1) February, 2009. p.42-44 (p.5-7 of the PDF)

The preservation of federal documents is too important to be left to the federal government alone; we have the makings of a viable system to preserve digital government publications. There are several ways you can help.

Join our private LOCKSS Network. Join the LOCKSS alliance, get a server for under $1,000, and contact us. The more servers in the USDocsPLN, the merrier.

Notify us of collections of electronic federal documents. LOCKSS staff can show you how easy it is to allow LOCKSS to ingest and preserve your materials.

Attack the root problem. Demanding your Members of Congress legislate and FUND a system that will ensure that GPO proactively deposits publications and data through the FDLP and other interested partners. While the USDocsPLN project is a good start and an excellent ad-hoc effort, it should be the government's responsibility to put information in the hands of taxpayers. We should not have to be prying it out of the government’s hands. A distributed digital FDLP benefits everyone.

Syndicate content Syndicate content