EPA Pilot Project Tagging Project

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

why in the Internet archive? why not the pages themselves?

As someone who has worked with EPA since pre-web days, I understand the desire to make their content more easily found.  but it's not clear to me why you'd do that via copies stashed at the Internet Archive, rather than the document URL (on the EPA site) itself? 

After all, EPA has some 750,000 documents, many of which are of transient value at best.  Many of the documents become superceded by new regulations, new interpretation of the rules, new scientific data -- and while that obsolete information probably has SOME historical value, it is not the first thing someone ought to find when they are looking for information on, for instance, the Clean Air Act.

I'm a big believer in social tagging, and look forward to seeing what this experiment accomplishes -- but I'm also a big believer in the idea that the URL IS the document, and should be treated as -- well, perhaps not sacred, but at least with great respect for what it represents, which is the authoritative location of a given document. 

In the case of the EPA, where documents may contain legal interpretations and rulings that directly impact how businesses and individuals behave, this is especially true -- pointing to the "wrong" document can have significant negative consequences for those trying to comply with the regulations.

 

IA is a testbed for tagging

Hi Scott,

 

I understand and sympatheize with your concern for authenticity and up-to-date information. These are concerns we at FGI share.

For the purposes of this test, we needed URLs we could guarantee would not be deactivated during our pilot project. That's why we saved the 32 documents to the Internet Archive.

If as a result of this project, social tagging is accepted as a way of improving findability for government documents, I would expect either the agency or some other suitably authenticated copy (GPO, LOCKSS-Distributed FDLP copy, etc) would be the copy of the document tagged.

The agency URL might not be the best place to go because of the volitile nature of information on the web. Some research has suggested that the average web document has an active life between 77 days and 4 years. Even the upper limit isn't very long for people wishing to document government information for longer than a presidential term. See this article for some reasonably up-to-date information on the topic of web volitility:

The Australian Library Journal (2005)
Still lost in cyberspace? Preservation challenges of Australian internet resources
Wendy Smith
http://www.alia.org.au/publishing/alj/54.3/full.text/smith.html

If other people know of more recent studies, please add them to comments.

Finally, let me assure you that FGI has no long term plan to post large volumes of government documents to the Internet Archive. That would be too much effort for any one organization.

------------------------------------

"And besides all that, what we need is a decentralized, distributed system of depositing electronic files to local libraries willing to host them." -- Daniel Cornwall, tipping his hat to Cato the Elder for the original quote.

Can't afford to rely on agencies alone

Social tagging is but one way to collect and give access to govt information. Carl Malamud at public.resource.org has been putting large amounts of govt information out on the open Web as a way to diffuse that information. That includes putting video and text at the internet archive, Smithsonian images on flickr, video on YouTube, and case law up on his own servers (read press release (PDF)).

I'm really heartened by Carl's work to get government information out to the public. Federal Agency CIO's, GPO and all government information producers would do well to follow these 8 Open Government Data Principles. Agencies, libraries and non-profit organizations will need to work together in order to assure easy access to and long-term preservation of government information. As Daniel points out, the public can't afford to rely on govt agencies alone in this endeavor.

2008 Study on Web Stability

I just found a brand new article about the persistance of web documents:

Casserly, M., & Bird, J. (2008, January). Web Citation Availability: A Follow-up Study. Library Resources & Technical Services, 52(1), 42-53. Retrieved February 19, 2008, from Professional Development Collection database.

If you have a subscription to EBSCOhost, you should be able to access the full text of the article at:

http://search.ebscohost.com/login.aspx?direct=true&db=tfh&AN=29379006&site=ehost-live

Overall, it looks like the persistance of URLs is not improving:

As in the original study, the researchers cross-tabulated the results with URL characteristics and reviewed and analyzed journal instructions to authors on citing content on the Web. Findings included a decrease of 17.4 percent in persistence, and 8.2 percent in availability on the Web. When availability in the Internet Archives was factored in, the overall availability of Web content in the sample dropped from 89.2 percent to 80.6 percent.

All the more reason to include libraries as custodians of digital content. They have a long term view to access and preservation that for some good reasons is not shared by commercial vendors and government agencies.

------------------------------------

"And besides all that, what we need is a decentralized, distributed system of depositing electronic files to local libraries willing to host them." -- Daniel Cornwall, tipping his hat to Cato the Elder for the original quote.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Easily link to terms in various wikis. For help, see <a href="/interwiki/3">interwiki</a>.

More information about formatting options

Add new comment