web archives

Web-at-risk: preserving govt and political information

Valerie Glenn, University of Alabama Libraries nee University of North Texas, has an article out in the current First Monday entitled, "Preserving Government and Political Information: The Web–at–Risk Project" that talks about ... wait for it ... Web harvesting!

It's based on her talk at 2007 WebWise Conference on Libraries and Museums in the Digital World. In fact the whole issue of First Monday 12(7) is dedicated to selected papers from the WebWise. Valerie's article the what and why of Web harvesting, gives some sample collections, tools, and services and talks a little about some of the overarching issues involved in Web harvesting. There's more information on the Web-at-risk wiki.

Besides Valerie's article, there are podcasts of all of the sessions from WebWise07 where you'll hear the likes of Liz Bishoff, Günter Waibel, Steve Puglia, Deanna B. Marcum etc.

And if you haven't heard of First Monday you owe it to yourself to get over to that link and check out all their past issues. Or look at Best Mondays, their most read -- or at least most accessed -- articles.

Millions and Millions of Government and Military Web Pages Archived by NARA and The IA

Last year we posted a note on ResourceShelf about the “2004 Presidential Term Web Harvest” containing more than 75 million .Gov and .Mil web pages, equal to about 6.5 terabytes of data. It's a project of NARA and The Internet Archive. The archived sites can be browsed or keyword searched.

Now available is the 109th Web Harvest.
What does it contain?
+ More than four million pages (42 GB) crawled and archived between 11/11/06 and 12/11/06
+ Browse by Members Name
+ Browse by Committee Name
+ Browse by Leadership
+ Browse by House or Senate Organizations

Go to: http://www.webharvest.gov/collections/

The harvest produced a public reference copy of the web sites for the purpose of continual availability to the public, and also produced a record copy to be retained in the holdings of NARA…Web sites included in the harvest were identified from information provided by the Web Systems Branch of the House Information Resources staff and by Senate webmasters in the Offices of the Secretary of the Senate and the Sergeant at Arms.

A bit more on ResourceShelf including a comment by Librarian of Congress, James Billington, about the average lifespan of a web site.

Syndicate content