Digital deposit

Digital Deposit: Lack of storage space is no excuse

This past weekend I was at my local Costco and not one, but two brands of 1 Terabyte (1000 GB) drives selling at around $300. I also saw a 500 GB (1/2 T) drive for $130. All of the drives were USB friendly meaning you could take one off the shelf and plug it into a USB port and have all that memory available to you.

What can you store in a Terabyte? According to an FBI article on digital forensics, plenty:

"a terabyte is equivalent to about 250 million pages of text, which would stack 10 miles high if printed on both sides of the page."

Surely that's enough space for even smaller libraries considering telling the Government Printing Office that they would like PDF ("access derivitives") delivered to them based on their profiles.

I admit, space isn't the only issue. But it's the objection I've heard most often and I honestly believe that technology has taken it away.

David Rosenthal says "Do it for Preservation!"

David Rosenthal is a member of Stanford's LOCKSS development team who maintains a blog about his professional work. It is well worth reading and deserves a place in everyone's list of RSS feeds.

In a June 10, 2007 posting on reasons to preserve e-journals, David explains that multiple, independently hosted government publications are a good thing because they are TAMPER EVIDENT:

The goal of the FDLP was to provide citizens with ready access to their government's information. But, even though this wasn't the FDLP's primary purpose, it provided a remarkably effective preservation system. It created a large number of copies of the material to be preserved, the more important the material, the more copies. These copies were on low-cost, durable, write-once, tamper-evident media. They were stored in a large number of independently administered repositories, some in different jurisdictions. They are indexed in such a way that it is easy to find some of the copies, but hard to be sure that you have found them all.

Preserved in this way, the information was protected from most of the threats to which stored information is subject. The FDLP's massive degree of replication protected against media decay, fire, flood, earthquake, and so on. The independent administration of the repositories protected against human error, incompetence and many types of process failures. But, perhaps most important, the system made the record tamper evident.

Winston Smith in "1984" was "a clerk for the Ministry of Truth, where his job is to rewrite historical documents so that they match the current party line". George Orwell wasn't a prophet. Throughout history, governments of all stripes have found the need to employ Winston Smiths and the US government is no exception. Government documents are routinely recalled from the FDLP, and some are re-issued after alteration.

An illustration is Volume XXVI of Foreign Relations of the United States, the official history of the US State Department. It covers Indonesia, Malaysia, Singapore and the Philippines between 1964 and 1968. It was completed in 1997 and underwent a 4-year review process. Shortly after publication in 2001, the fact that it included official admissions of US complicity in the murder of at least 100,000 Indonesian "communists"by Suharto's forces became an embarrassment, and the CIA attempted to prevent distribution. This effort became public, and was thwarted when the incriminating material was leaked to the National Security Archive and others.

The important property of the FDLP is that in order to suppress or edit the record of government documents, the administration of the day has to write letters, or send US Marshals, to a large number of libraries around the country. It is hard to do this without attracting attention, as happened with Volume XXVI. Attracting attention to the fact that you are attempting to suppress or re-write history is self-defeating. This deters most attempts to do it, and raises the bar of desperation needed to try. It also ensures that, without really extraordinary precautions, even if an attempt succeeds it will not do so without trace. That is what tamper-evident means. It is almost impossible to make the record tamper-proof against the government in power, but the paper FDLP was a very good implementation of a tamper-evident record.

You'll notice that David refers to the depository program in the past tense. He does so because, like GPO itself, he sees the Future Digital System (FDSys) as an inevitable total replacement:

It should have become evident by now that I am using the past tense when describing the FDLP. The program is ending and being replaced by FDSys. This is in effect a single huge web server run by the GPO on which all government documents will be published. The argument is that through the Web citizens have much better and more immediate access to government information than through an FDLP library. That's true, but FDSys is also Winston Smith's dream machine, providing a point-and-click interface to instant history suppression and re-writing.

David thinks this is a bad thing, GPO assures us it is a good thing, but both assume this is where we are going.

But it doesn't have to be this way. We in the FDLP are definitely "Not Dead Yet!" We have a vital role to play in continuing to preserve the tangible materials entrusted into our care. Further, hundreds of new tangible titles are being shipped each month by GPO to the 1200 plus federal depository libraries.

And while the depository community hasn't exactly leaped up and embraced their responsibility to preserve federal electronic publications, individual libraries like the University of North Texas and the New Mexico State Library have. Together with others who have held views on preservation similar to David's for years these libraries will help build the depository system of the future.

Or we can sit back and let Winston Smith control our government information. If you are a government information specialist, it's up to you.

Iowa Publications Online - Thanks SDLTF!

As part of the ALA meeting:

State & Local Documents Task Force, GODORT
Annual Meeting: Washington D.C.
Saturday, June 22, 2007; 8:00-10:00am
Renaissance Hotel: Congressional A/B

There will be an open forum that I hope all ALA-attending FGI readers will visit:

"Preservation of born digital Iowa online state publications
(technological aspects). Barbara Corson, Program Director for Library Services for the Iowa State Library, will be speaking about the technical nuts and bolts of the Iowa Publications Online Project. If you want to know how to preserve born digital State Publications you will want to come to this session. "

Compliments to SLDTF for having a session on preserving born digital publications. We at FGI hope it is one of many to come out of ALA Godort meetings. If you attend, please send a write up to admin AT freegovinfo.info.

If you're like me and not able to attend ALA this summer, check out the Iowa Publications Online Project and its associated FAQ.

Social Psychology for Librarians

I've been reading a textbook called Social Psychology by Thomas Gilovich, et al and looking at its companion website. I wanted to share some ideas from the book that I think will be of real use to librarians and other government information professionals trying to persuade people to take action to ensure access, preservation and privacy with respect to government information. It may also be of use to people trying to raise awareness and usage of government information, library websites and libraries.

So often activists put out calls to action that either seemingly fall on deaf ears or make people aware of issues without taking action. Why is this? It could be because of the way people tend to change their attitudes. In chapter 7 of Social Psychology, we are told that people are open to persuasion on two levels -- a central route and a peripheral route. The central route of persuasion is what is most familiar to us -- "People attend carefully to the message, and they consider relevant evidence and underlying logic in detail." Speaking for myself here, this is the way I tend to try and convince others. I attempt to lay out the evidence to convince people of my point of view or to understand why I think something is under threat. I use statistics when I can and logical-sounding thought experiments when I don't have statistics.

If you look at campaigns to increase library use or use of library-purchased electronic resources, I think you see a similar pattern -- "You should use the library because we have x and y and you'll save time and money."

But it turns out that people only use the central route under certain conditions -- "when the message is relevant to them, when they have knowledge in the domain, and when the message evokes a sense of personal responsibility."

What happens when people don't feel like a message is relevant, when they don't have a lot of knowledge in a particular area and/or they feel no personal responsibility? They take the peripheral route of persuasion -- "people attend to superficial aspects of the message. They use this route when they have little motivation or time or ability to attend to its deeper meaning. In this route, people are persuaded by source characteristics (such as attractiveness and credibility of the communicator) and message characteristics (such as how many arguments there are and whether the conclusions are explicit)"

Looking that two two methods of persuasion in detail, I see immediate problems in the efforts of librarians in general and documents librarians in particular to get people to care and be good stewards of our resources. I'd like to outline these problems specifically for those helping to stimulate the building of local digital collections and invite librarians in other disciplines to see how these different routes might explain disconnects with their audiences.

I believe that I and others in the "digital deposit" movement have been obsessing over crafting ever better "central route" messages without realizing that much of our core audience (other documents librarians and other government information users) are in fact at the peripheral level though no fault of their own. Let's look at the "central route" factors again:

  1. Relevance to audience;
  2. Audience has knowledge in the domain;
  3. Audience has sense of personal responsibility.

Relevance -- This factor could go either way. Docs librarians understand a message of digital deposit is relevant to them because it is about government information or it's not relevant because the word "digital" makes it an IT concern and not theirs.

Knowledge -- While documents librarians have tremendous knowledge of government information products and fine knowledge of how to use Internet-based products, general IT skills and knowledge of local/remote repository options (LOCKSS, dSpace, OAIS, etc) is low. We at FGI have heard from people concerned about the problem, but have no idea what to do and aren't sure where to look for answers.

Personal Responsibility -- My personal sense is that this area is the greatest challenge to any "central route" approach of persuading depository librarians to build the geographically distributed depository system of the future. Although the Government Printing Office (GPO) has zero track record in preserving government information over the long haul and in fact no onsite collection at all until very recently, it now proposes to be the sole preserver of federal government information through its Future Digital System. Since this public commitment seemingly absolves libraries of their traditional preservation responsibilities, a majority of our documents colleagues say "GPO's got it covered, why do I need a local collection. Their problem, not mine." And so any message based on library responsibility to preserve materials regardless of format gets tuned out.

Obviously I wouldn't be blogging about this if I thought the correct course of action in light of the above was to throw in the towel, go home and kick back with some Alaskan Amber and a good salmon dish. So, what do we do if we are librarians either interested in getting our colleagues to build locally-housed, but Internet shared digital document collections or if we're trying to educate the larger public about the
availability of government information, specialists willing and able to help them (librarians), and the need to protect both?

As I see it, I think it's using the peripheral route to convince people that government information is relevant to them and they've got responsibility for its continued availability. We also need to provide clear direction as to HOW people can use government information AND keep it available for the future. Once we've done that and people have relevance, knowledge and responsibility, we can go back to the "central route" arguments to solidify our gains.

But how to use the peripheral route? Let's look at its characteristics again: "people are persuaded by source characteristics (such as attractiveness and credibility of the communicator) and message characteristics (such as how many arguments there are and whether the conclusions are explicit)" So perhaps we can hire Antonio Banderas and Catherine Zeta-Jones to be spokespersons for government documents. :-)

Or perhaps we should focus on credibility of the communicator and message characteristics. Perhaps we could lobby ALA or other library organizations to come out in favor of local digital collections, or at least provide an information clearinghouse on the subject. If we as "digital deposit" advocates can get our message out through existing organizations, their credibility might help the cause. In the case of depository libraries advancing their case, they might try to get a prominent citizen or some other respected person to publicly talk about the value of federal depository libraries.

In terms of message characteristics, researchers have found that short messages combined with instructions have helped increase a desired action. For example in 1967, Leventhal, Watts, & Pagano found that people who watched a film about smoking dangers AND were given smoking cessation tips smoked only a third as much as people who where just given tips or who just saw the film.

Part of the short messages should be stories, sort of like the ones we've been trying to collect under our Depository Success Stories, stories about libraries being collected by ALA, or even just blogging about how we answered a question on average tariff levels. Lobbyists have been trying to get us to be storytellers for years. Social psychologists have understood the power of persuasion for so long, they even have a name for it -- the indentifiable victim effect.

The other part of our message should be about what librarians and other people can actually DO. Here at FGI we've tried to answer part of that question at least implicitly by having pages about remixing government information, blogs of government documents librarians, and resources for capturing digital resources and producing video clips promoting resources.

Any ideas about how we put this all together? Get GODORT to hire attractive people to put together a YouTube series called This Old Depository where we give step by step instructions on building your very own globally accessible local digital collection? Let's all think about it together and start a new season of persuasion.

Early CLOCKSS Lessons

Reprinted with permission from the LOCKSS Alliance mailing list:

-------------------
Dear Colleagues,

The CLOCKSS (Controlled LOCKSS) Board would like to take this opportunity to apprise you of our progress, to share early lessons, and to encourage you to participate in the process of building this shared resource.

The CLOCKSS participants (major academic publishers, research libraries, and the Stanford University team) are building a community-governed, stable, digital archive for published scholarly content. CLOCKSS access is unbundled from fees: after a “trigger” event (when a publisher is no longer able to provide electronic access to some or all of its archived material), content will be freely available to all. Many libraries have moved away from building and preserving collections, and there is increasing interest in community stewardship and preservation of, and guaranteed long-term access to, scholarly publications.

Since its inception early in 2006, the CLOCKSS members made significant strides towards the effective management of archived materials, and learned some important lessons. We are also extremely proud to have been awarded the ALA ALCTS 2007 Outstanding Collaboration Citation, which will be formally presented at ALA’s annual meeting in Washington in June.

To find out more about our early lessons and progress, go to www.clockss.org and click on the link “CLOCKSS Lessons.”

As always, we welcome comments and suggestions. Please let us hear from you.

Sincerely,

Vicky Reich
vreich@stanford.edu
--------------------

I took Vicky's advice and checked out some of the CLOCKSS lessons. While I think you should read the entire five page documents, here are some good quotes that I think are worthwhile to documents librarians. Just think of "federal government" whenever you see the word "publisher":

The most important, and first, lesson learned by CLOCKSS participants was that commercial,
university press, and society publishers; and librarians can collaborate effectively and thrive by working as equals to build a community-governed archive. The CLOCKSS Board meets formally
twice each month by phone and twice a year in person. The Board establishes policies and implements procedures for wide range of social, business, content, and technical issues.

---------

The archived content is a valuable asset, into which scholars, librarians, and publishers have
made considerable long-term investments; it must be protected from a wide variety of possible disruptions whether deliberate or accidental. The CLOCKSS archive network is made up of
widely distributed host libraries spanning geographic, political and legal boundaries, and this global network, under the stewardship of those who’ve invested so heavily in it, will protect these important assets for future generations of scholars.

------------

In February 2007, the CLOCKSS team first successfully demonstrated the process that would follow a trigger event (retrieving preserved presentation content from the network of CLOCKSS boxes, transferring it to a publishing platform, and making it available to readers).

-------------

Over the long term, the CLOCKSS Board intends to raise a capital fund to pay for most (if not all) of the archive’s ongoing expenses. Digital preservation requires continuous processes; when
active preservation ceases, materials are lost. By building a capital fund and becoming selfsustaining, CLOCKSS will ensure that the preservation processes continue over time, regardless of the availability of outside sources of revenue (a circumstance with which libraries are wellfamiliar – witness the recent rescission of Library of Congress NDIIPP funding to help finance other American government priorities).

----------------

No one agency can or should preserve government information all on it's own. There is another way.

Librarian of Congress testifies on the 21st century library

James H. Billington, The Librarian of Congress, testified before the House Subcommittee on the Legislative Branch on March 20, 2007. read Billington's full testimony here.

Billington pointed out that digital information is particularly fragile, but as the number of "digital transactions" that the LOC handles on a yearly basis, is extremely useful and of interest to students, historians, researchers and the general public. Billington said, "No single institution can collect, save and provide access to digital content in the future. Almost all of the Library's digital initiatives involve learning to work in new ways, in a networked environment, where we are working with others to amass critical content and deliver new and improved services." Check out "LC21: A Digital Strategy for the Library of Congress" to see the LOC's analysis of the library's digital future.

We're not saying that every library has to manage 295 terabytes of digital content, but ALL libraries should be thinking about, planning for and working toward being digital repositories for their communities. That includes digital deposit, harvesting and other avenues for building digital collections.

It took two centuries for the Library of Congress to acquire today's analog collection—32 million printed volumes, 12.5 million photographs, 59.5 million manuscripts and other materials – a total of more than 134 million physical items. By contrast, with the explosion of digital information, it now takes only about 15 minutes for the world to produce an equivalent amount of information. Researchers at Cal-Berkeley produced estimates of the amount of information produced and circulated on the Internet in 2003 – it was equivalent to 37,000 times the content of one Library of Congress. Most of this information exists only in digital form: so-called born-digital items, many of which are already irretrievably lost.

There is a widely-held but false assumption that digital materials accessible today on one's PC or Blackberry will necessarily be available in the future. That is not the case. The average life of a Web site has been estimated to be 44 to 75 days (bold added), and information not actively preserved today could literally be gone tomorrow. Other essential digital information—most notably e-journals and data bases—are merely licensed for use in the short term– the information does not belong to the licensee. By contrast, traditional print books and journals collected by the Library for more than two centuries are, and will remain, in the possession of the Library and accessible to researchers. But it is current information that is often most needed by Congress, and current, up-to-date information is increasingly available only in digital form.

Comment Submission to GPO and Depository Library Council

On November 26, 2006, the volunteers at Free Government Information passed along all comments received on our DLC Digital Distribution page to the people named below. For a PDF copy of our responses and your comments, please see the file attached to this page.

Dear Mr. James, Ms. Russell, Mr. Davis, Mr. Wash and members of the Depository Library Council:

As you are well aware, the Depository Library Council held a session on Digital Distribution in Washington DC on Wednesday, October 25, 2006 as part of the Fall 2006 Depository Library Council meeting.

In hopes of broadening the discussion on digital distribution beyond the confines of in-person Council meetings, I and the other volunteers at Free Government Information posted responses to the Digital Distribution discussion questions presented at Council on the FGI website (http://freegovinfo.info) and invited others to add their own comments.

We are attaching to this email a copy of our comments and the responses from other librarians. We do not pretend to represent the whole depository community, just a few voices from people unable to travel to Council.

We appreciate the fact that you are addressing this issue and reaching out to the depository community for their comments and ideas. Because this issue is so important and because so many depository librarians cannot attend Council meetings, we suggest that you reach out to the community even more and take input from a broader section of the community before major decisions are made. Some of the actions you might take include:

  • Using your OPAL meeting room to hold a set of “virtual town meetings” on the topic of digital distribution.
  • Survey either the entire FDLP or the subset of libraries indicating willingness to take digital items on issues related to digital distribution.
  • Send GPO/DLC reps to State library association meetings to given presentations on FDSys and take suggestions on Digital Distribution.

Thank you for taking time to receive more input on an issue that will shape citizen access to federal government information for years to come.

Selective-deposit and the technical requirements of a digital-deposit FDLP

There was one issue that evidently came up in the discussions of digital deposit at the Fall 2006 Depository Library Council meeting that we at FGI think deserves highlighting and clarification from Council or GPO or both.

This issue relates to "general assumption 5"

Libraries receiving FDLP digital publications would be responsible for providing sufficient infrastructure, including bandwidth and storage, to provide timely and effective public access.

In the notes about the proceedings that we have read (here and here), there were questions about "streaming video" and other bandwidth-intensive infrastructure needs.

Our concern is that a library might assume from this discussion that if it wants digital deposit, it will be required to support things like streaming video.

We don't believe that is the intent of the GPO or Council even in these early discussions, but it would be useful to have GPO or Council or both clarify that selectivity in a digital-FDLP is still a valued concept and that there is no intention to develop a one-size-fits-all technical infrastructure requirement.

To elaborate on this just a bit and state what is probably obvious to most of us, what we are thinking about here is analogous to what FDLP has always had in the paper world. In the paper world, not everyone had the physical infrastructure or resources to deal with the serial set or Y4's from every little committee. We might think of those as the streaming media of the paper world. But the availability of those large sets in the depository program did not mean that every library had to select them; it did not mean that every depository library had to have the infrastructure to deal with them. But availability of those materials in a selective-depository program did mean that some libraries could select them.

And so it should be, we believe, in a digital-depository FDLP. Selective depositories would still be able to select what information content and types they would want. That would mean that a small library and a large library with vastly different technical infrastructures could both participate in a digital-deposit FDLP.

That still leaves the issue of regional depositories and how the community will define them in the digital age. But that is a different question that should be dealt with separately. We shouldn't confuse the issue of technical requirements for most FDLP libraries with the technical requirements for a future regional-depository model.

We can foresee, for example, one selective depository library hosting databases and multimedia files, another having a large collection of PDF and HTML files on a topic or subject area gathered together from several agencies along with commercial information into a digital subject collection, and another library having a few PDF files and CD-ROMs and DVDs available to users on a library workstation or library local area network. There should be room in a digital-deposit FDLP for all these scenarios and more.

One of the advantages we will gain in a digital FDLP is the ability for each library to select what it wants with great precision. Selection by media-type, subject, agency and pre-coordinated item-numbers will just be the beginning. Selecting based on content as determined by keywords or authors, on technical requirements for delivery, on popularity, on availability (or lack of availability) of the content in other formats, and more should all be possible.

A digital FDLP can have more flexibility than we ever had in the paper world and GPO and Council can and should make that point now by affirming that the technical infrastructure requirements will be flexible.

We encourage everyone to participate in this important discussion that will determine the future FDLP. We'd like to remind you to check out the notes and audio of Fall DLC and by all means please give us your thoughts on digital deposit.

Syndicate content