digital collections

UVA puts founding fathers' papers online! (temporarily :-| )

[UPDATE: I spoke too soon. Seems that these are "early access" documents that "will be removed from this database, to be replaced by the fully edited version in the appropriate digital edition in the Rotunda American Founding Era collection."]

Nice work U of Virginia! You can access the papers here. I hope this makes it into the FDLP digitization registry.

More than 200 years after they were written, some 5,000 previously unpublished documents of the founders of the United States — including Thomas Jefferson, John Adams and James Madison — are at long last available to the public at no cost.

The Documents Compass group of the Virginia Foundation for the Humanities at the University of Virginia has spent much of the last year proofreading and transcribing thousands of pages of letters and other papers.

The documents are now available online for free at the University of Virginia Press’ digital imprint called Rotunda...

...The online project is a federal pilot study that aims to expand public access to the papers of America’s founders. It is funded by a $250,000 grant from the National Historical Publications and Records Commission, which is a division of the National Archives.

[Thanks Resource Shelf!]

UNL's Government Comics Digital Collection

The University of Nebraska-Lincoln Library has recently put together a very unique collection of government information. Free and available to all, UNL's Government Comics Collection is a digital library containing 174 scanned comics books from various government entities. In the government realm, comics books have had a long and rich history as a delivery medium for government information. UNL has managed to successfully amass a pretty impressive collection.

(found via MetaFilter)

Cornell library removes all restrictions on use of public domain reproductions

In a "dramatic change of practice," Cornell University Library has decided it will no longer require its users to seek permission to publish public domain items duplicated from its collections. I congratulate Cornell and hope that other libraries will follow this precedent.

"The threat of legal action, however," noted Anne R. Kenney, Carl A. Kroch University Librarian, "does little to stop bad actors while at the same time limits the good uses that can be made of digital surrogates. We decided it was more important to encourage the use of the public domain materials in our holdings than to impose roadblocks." The immediate impetus for the new policy is Cornell's donation of more than 70,000 digitized public domain books to the Internet Archive (details at www.archive.org/details/cornell).

"Imposing legally binding restrictions on these digital files would have been very difficult and in a way contrary to our broad support of open access principles," said Oya Y. Rieger, Associate University Librarian for Information Technologies. "It seemed better just to acknowledge their public domain status and make them freely usable for any purpose. And since it doesn't make sense to have different rules for material that is reproduced at the request of patrons, we have removed permission obligations from public domain works."

[HT BoingBoing!]

Economist Interview with Brewster Kahle of Internet Archive

The Economist has an online article "The Internet's Librarian" that is also in the March 5th, 2009 print edition.

...the founder of the Internet Archive explains what has driven him for more than a decade. “We are trying to build Alexandria 2.0,” says Mr Kahle with a wide-eyed, boyish grin. Sure, and plenty of people are trying to abolish hunger, too.

It would be easy to dismiss Mr Kahle as an idealistic fruitcake, but for one thing: he has an impressive record when it comes to setting lofty goals and then lining up the people and technology needed to get the job done. “Brewster is a visionary who looks at things differently,” says Carole Moore, chief librarian at the University of Toronto. “He is able to imagine doing things that everyone else thinks are impossible. But then he does them.”

This is probably my favorite quote:

“Come back when you have a warrant,” reads the floor mat underneath his office recliner. It was a gift from the Electronic Frontier Foundation (an activist group on whose board Mr Kahle sits) after Mr Kahle refused to hand over information about one of the Internet Archive’s users to the Federal Bureau of Investigation in 2007.

I only wish more interviews with Brewster would discuss the plethora of government documents that are in Internet Archive. It's a valuable resource and it keeps growing!

FDLP: Services and Collections

Against the Grain has kindly permitted me to post this preprint of my new article.

In the age of digital information, libraries and librarians are struggling to define their proper roles. In a time of financial uncertainty and economic crisis, many libraries are facing decisions that will have long-term implications and consequences. At a time like this it is particularly important that we have a clear vision of a sustainable role for libraries.

And, don't forget Distributed Globally, Collected Locally: LOCKSS for Digital Government Information by Daniel Cornwall and James R. Jacobs. Against the Grain, 21(1) February, 2009.

Distributed Globally, Collected Locally: LOCKSS for Digital Government Information

Since Daniel mentioned yesterday about LOCKSS and digital deposit as recession insurance (which BTW is a GREAT oogly hook for open govt!!) I thought I'd mention a hot new article that Daniel and I wrote for the February 2009 issue of Against the Grain about the new U.S. Government Documents Private LOCKSS Network (citation below). The issue has not officially been released, but we got permission to post to FGI as a preprint.

The article describes the LOCKSS model of digital preservation and why that model is beneficial to apply to the realm of digital government information. We describe Carl Malamud's herculean efforts toward better access to government information; Then talk more specifically about the new USDOCS Private LOCKSS Network (USDocsPLN) using those documents harvested by Malamud. The paper concludes with a call to action.

Let us know what you think. and by all means, help us move forward with the USDocs network by participating. LOCKSS is great recession insurance and SO much more!

Citation: Distributed Globally, Collected Locally: LOCKSS for Digital Government Information. Daniel Cornwall and James R. Jacobs. Against the Grain, 21(1) February, 2009. p.42-44 (p.5-7 of the PDF)

The preservation of federal documents is too important to be left to the federal government alone; we have the makings of a viable system to preserve digital government publications. There are several ways you can help.

Join our private LOCKSS Network. Join the LOCKSS alliance, get a server for under $1,000, and contact us. The more servers in the USDocsPLN, the merrier.

Notify us of collections of electronic federal documents. LOCKSS staff can show you how easy it is to allow LOCKSS to ingest and preserve your materials.

Attack the root problem. Demanding your Members of Congress legislate and FUND a system that will ensure that GPO proactively deposits publications and data through the FDLP and other interested partners. While the USDocsPLN project is a good start and an excellent ad-hoc effort, it should be the government's responsibility to put information in the hands of taxpayers. We should not have to be prying it out of the government’s hands. A distributed digital FDLP benefits everyone.

Obama’s Inaugural Speech: visualized, video-searchable

President Obama's inaugural speech has generated some interesting examples of how technology can be applied to government information when the information is freely available for use and re-use and not locked into government databases or proprietary formats. It is a small piece of text with a lot of public interest and high visibility and, therefore, ripe for these kinds of demonstrations and experiments. Of course, to make use of the information, we have to actually have a copy of it. Imagine what would happen if all government information was actually distributed in open formats to libraries so that we could build collections that were index-able, search-able, visually browsable, and analyzable in interesting ways. Imagine freeing government information from its .gov silos and integrating it with non-government information in digital collections created for particular virtual communities of interest. Imagine the future of digital collections that are as easily re-usable as this small bit of text.

Check out these examples!

  • Inaugural Words: 1789 to the Present, New York Times. "A look at the language of presidential inaugural addresses. The most-used words in each address appear in [an] interactive chart..., sized by number of uses. Words highlighted in yellow were used significantly more in this inaugural address than average."
  • Visual of the Inaugural Address, ProPublica. [Compare this to the NYT version. Stop words matter!]
  • Search Inside Obama’s Inaugural Speech. Delve Networks. "We invite you to experience President Obama’s inaugural speech using our search inside technology. To do this, type what you’re looking for into the player searchbar above. A heatmap will show you where information related to your topic appears in the speech. You can move your mouse over the heatmap to see the matches. Click to jump to that place in the speech."

Interview with Internet Archive Founder

FLYP online magazine published an interview with Internet Archive's founder, Brewster Kahle, entitled "Know It All". There is a text version of the article, but the interactive multi-media verison is much more fun! Plus, it contains a nice video showing Brewster explaining the mission of Internet Archive.

Brewster Kahle wants to give you digital access to every book, film, video, song, TV show and periodical ever published. If he succeeds, the world will be a different place.

Popularity brings site down

File this under "lessons learned." The European Union's new Europeana digital library, which was launched on November 20, had to be taken offine because the heavy demand by users -- 10 million hits an hour -- overwhelmed the servers.

The home page of "Europeana" today says, "Popularity brings the site down....We are doing our best to reopen Europeana.eu in a more robust version."

A story in the Christian Science Monitor (Everybody loves the digital library – maybe too much, by Marjorie Kehe, 12.09.08) says the site "had to be shut down within hours when powerful user demand swamped its system" and describes the digital library this way:

The online collection of Europe’s cultural heritage was launched on November 21. Europeana will allow users anywhere to access books kept in European libraries as well as films, paintings, photographs, sound recordings, maps, manuscripts, newspapers, and documents.

The event reminds me of the problem the House had recently (Scaling house.gov).

OSTI: collecting AND connecting scientific govt information

Did you know that the Office of Scientific and Technical Information (OSTI) has a blog? The OSTI blog turned 1 year old last month but has only been in our In other news... section for a short time.

I'm really impressed with the work that OSTI is doing to build digital collections of scientific and technical information as well as to push the boundaries of access by building databases, federated search tools, being an OAI node, distributing bibliographic records and generally finding unique and innovative ways to make scientific and technical information available on the Web (I just love the idea of an adopt-a-doc program!!).

In particular, a blog post entitled Beyond Collecting: Connecting from a few weeks back (yes my feedreader is bursting at the seams :-) ) caught my eye. They've basically gone out and built a digital infrastructure along the lines of what we at FGI have been advocating for lo these many years. That is, they've realized that they can't possibly collect it all. Instead of building one big central repository, they're relying on many agencies and actors to host content and standards-based metadata of interest to them. OSTI can then use increasingly robust digital tools to aggregate and provide search mechanisms for vast amounts of information -- to "connect users with the highest quality science information without collecting or hosting it."

THAT'S what I envision for the Federal Depository Library Program: a collaborative network of libraries (a technical and social P2P network!) hosting content of interest to their local communities, creating and maintaining standardized metadata, connecting up with each other to create powerful search tools across the network. This is the many-hands-make-light-work digital model to which we in the documents community should be espousing.

--that is all.


OSTI has embraced a new paradigm for sharing scientific and technical information (STI). Historically, OSTI has fulfilled its mission of providing STI to scientists, researchers, and the public by hosting, or collecting, documents and/or metadata. OSTI's new paradigm is to make content searchable that is often hosted by others; today, OSTI connects those seeking the content with the organizations that host it.

Beginning in the late 1940's, with OSTI's production of the Nuclear Science Abstracts - which was to go on for nearly 30 years, OSTI entered into the business of collecting information. Beginning in the 1990’s, OSTI began creating web application to make the collected content openly accessible and conveniently searchable. ETDE Web, DOE Information Bridge, the Energy Citations Database, and DOE R&D Accomplishments are some of the successful applications.

In the last several years, OSTI’s approach to disseminating STI has evolved. Recent applications such as the Eprint Network, Science.gov, DOE Science Accelerator, and WorldWideScience.org connect users with the highest quality science information without collecting or hosting it.

How does OSTI move beyond collecting to connecting and what does connecting mean? OSTI's new applications search content that is housed in document repositories owned by a number of government agencies and government-sanctioned organizations. OSTI applications search a number of these repositories on the fly and they aggregate the content from the sources they search and present the most relevant of the search results to the user. This simultaneous and real-time search of multiple repositories is called federated search. OSTI's federated search applications serve as portals to specific subjects. In being subject-specific, they connect users to the highest quality STI in their fields of interest.

Why is OSTI embracing the connection model? Quite simply, OSTI can far better achieve its mission by making great quantities of content openly accessible and conveniently searchable, but it is impossible to collect and keep current such quantities of content from multiple content sources. “Connecting” to content is doable, while “collecting” is not. (My emphasis added!)

We believe that by connecting users to content, we provide a more comprehensive and authoritative search. In doing so, we accelerate the advancement of science.

Funding Collections and Services in the Public Interest

Do you ever worry about funding for your library? Have you ever thought about how to get a grant to help your library? Do you wonder about how you might attract grant funding to a library in the age of Google and the Web?

If you answered "yes" to any of those questions, I recommend the article Digital Infrastructure and Public Interest by Vince Stehle, in Grantmakers in the Arts Reader, Fall 2008.

(I posted a link to this article a few days ago but, after John referred to it in his 66 Days to Government Information Liberation post, I wanted to follow up a bit and mention why I think the Stehle article is important for libraries. This also gives me an opportunity to contribute some more to the excellent discussion that John is facilitating about Government Information Liberation.)

Stehle is a program director at the Surdna Foundation, which makes grants in the areas of environment, community revitalization, effective citizenry, the arts, and the nonprofit sector, and he was writing for Grantmakers in the Arts Reader. In addressing his audience of grantmakers, foundations, and people who support non-profits he says that there is an opportunity and even "an imperative" for foundations to support non-commercial work and help build "a public interest infrastructure" that will "promote the free exchange of knowledge over the Internet."

In specifically emphasizing the need for non-commercial support he says that we cannot rely on the private sector to operate in the broad public interest except as that interest translates into profit:

"While there are billions of dollars in Silicon Valley venture firms seeking to invest in the next Google, Facebook, or YouTube, there is no equivalent capital pool available for investment in the expansion of social enterprises operating in the public interest."

We often make that point here at FGI and extend it to those in government who see their information content as an "asset" and a source of needed dollars and not as a public good that should be in the public domain, freely and openly available for use and reuse. As Stehle says:

"So the real challenge is for grantmakers to figure out how to effectively identify, vet, and support promising new media and information services that put the public interest before commercial profits." [emphasis added]

I believe we in libraries should listen to Stehle's message and think about what it means for grant support for libraries. After all, most (all?) libraries are non-profits, and so many of our best libraries (and certainly our FDLP libraries) explicitly support the public interest, and libraries need funding to do their work.

To put this in a library context, I think we need to think about what libraries have to offer that other institutions and grant seekers do not. As I mentioned in an earlier post, libraries -- because of their values of free, equitable, open public access to information -- are better positioned than anyone else to seek and get funding for those very kinds of activities that Stehle describes.

But, how do we differentiate libraries from others? What are our unique roles? Many libraries are struggling to define their roles and purposes in society. John picks up on this and says that Stehle is one of those who "argue from the perspective, the library/web morphing together into some kind of global resource is a done deal." (I disagree with John on this; I don't see where Stehle says this or anything like it.)

John seems to be saying (correct me if I am wrong) that the center of libraries' responsibilities has shifted because there are new distribution mechanisms and because we have new abilities to make better use of information. He says that it (the role of libraries?) "is something no longer centered on possession and/or control...."

I think this is a grave mistake. While I agree strongly with John that libraries can and should use technology to "knit together the medium of governance (politics, policy, law, and programs) with how our communities use the civic message to inform their daily lives," I also believe that possession and control of information is an essential, primary role for libraries. If we do not possess copies of information and control where it is and control its very existence (keep it from disappearing or being altered or lost), we cannot do the exciting mashups that we want to do.

I also think that, while libraries can and should use technology to "knit" and "weave" information from a lot of different sources (see: collections, services, and "mini-libarians"), I don't think that this is a unique role for libraries -- nor should it be. What libraries can do that is unique, though, is select, acquire, organize, and preserve information and ensure that our services for that information make it possible for others to do their own "knitting and weaving."

In short, libraries can make the case that one of their roles in society is to maintain digital collections that others can use and reuse and mix and mashup. We can make the case that society will lose information if it relies only on information-producers to preserve information for the long term and we can argue that society will lose free, open access if we rely on those who see their "content" as an "asset." We can make the case that libraries are non-profit, public-interest organizations that will guarantee long term preservation and free access to information. We can argue that if the information is not preserved, there will be nothing to share and knit and mash-up. We can argue that libraries facilitate information use and reuse.

But, don't take my word for it. Re-read the excellent article Managing Digital Assets in Higher Education: An Overview of Strategic Issues by Donald J. Waters from 2005 (or my brief summary and comment of it). Or read the paper that Stehle refers to, Sustainable Public Media Infrastructure which describes non-profit organizations that are creating permanent, sustainable public knowledge and communications infrastructure that is designed for public benefit. Then reflect on the primary, central importance of permanent digital collections in libraries.

Digital Government Summits

This morning, I checked my friends' Twitter updates, as I often do. I was intrigued by the discovery that my friend and colleague Michael Sauers would be attending (and Twittering about) the Nebraska Digital Government Summit today. The description of the event makes me think this summit might be of interest to government documents librarians:

As citizens increasingly use technology in the workplace and in their personal lives, they expect government information and services to be readily accessible through technology. The Nebraska Digital Government Summit will provide an opportunity to learn how new and emerging technologies can be used to expand access to services, reduce costs, increase efficiency, and improve public safety.

A quick look at the site reveals that there are similar events in most states. Have any FGI readers ever attended one of these summits? What did you think?

Explaining "Born Digital" Gov Docs to Patrons & Professors

I had to explain to a student patron and their Professor today what is meant by "born digital" and how digital government documents are wonderful resources for a paper if we do not have the print version or when the print version doesn't exist (or is horribly out of date). Have any of you had to explain this a lot?

It all started when the student patron told me she could only have three web sources for her Nursing research paper after I had shown her the wonderful world of digital documents online. She had found an eleven year old version of a government print source in our catalog but I cringed...born digital documents online via NIH or the U.S. Dept. of Health had more up to date medical information on her topic! I told her to use both the print and online sources. She would be able to see if there were any noticeable differences from the 1997 print version and the 2007/2008 online information on her topic.

I contacted the Professor and explained this too. All is well and she will allow for the use of online government information. She was just hoping to avoid the use of too many general (i.e. crappy) websites. I understand that but I wanted to make sure that the student would not be punished for using several good government online documents and websites for her paper.

I didn't get into the nitty gritty digital authentication of government documents, but with some Professors who require legislative research, I tell them about the digitally authenticated documents that currently exist from GPO.

I have a feeling we government document librarians are going to have to explain this concept of "born digital" gov docs and digital authentication more often...especially now that more and more gov docs are being born digitally.

What do you want to know about Archive-it?

I'd like to survey you, our loyal FGI readers. I'm co-presenting with Molly Bragg at next week's Depository Library Council conference about digital collections using archive-it (see title and abstract below). I've got an outline but I'd really like to know what questions YOU have about archive-it and digital collections. What do YOU want to know about archive-it? So, please please please leave a comment here so that my presentation will be even more amazing :-)

Title of Presentation:

Gone Today, Here Tomorrow: Archiving and Preserving Born Digital Government Documents

Abstract:

Stanford University Library has been a federal depository library since 1895. In 2007, the library began collecting born digital documents using Archive-It, the web archiving service from Internet Archive (www.archive-it.org). In this presentation James Jacobs will discuss his group's objectives and procedures for selecting and archiving digital content and share examples of the unique content preserved. Molly Bragg will present an overview of web archiving projects and tools used and developed by Internet Archive. These tools are used by libraries around the world to preserve government documents and other born digital content.

Federal Agencies Digitization Guidelines Initiative

I've been reading and digesting the recently released Federal Agencies Digitization Guidelines Initiative website and the sustainable formats page, so I can discuss it (if there is time) during my presentation at next week's Depository Library Conference.

A dozen federal agencies launched an initiative to establish a common set of guidelines for digitizing historical materials. Two working groups have been established: the Still Image (books, photographs, maps, etc.) and the Audio-Visual Working Group. They have two draft documents currently up for review and comment: Tiff Image Metadata and Digital Imaging Framework. Comments are due on November 15.

I'm also loving their glossary of terms, which "has been generated to serve the participating agencies as a standardized vocabulary for their deliberations and guidelines" and it is "a work in progress" so suggestions are welcome.

Syndicate content Syndicate content