Month of April, 2007

The OpenHouse Project Op-Eds at The Hill

The first article in a weekly series, exclusively in The Hill, exploring the recommendations of the Sunlight Foundation's Open House Project, which advocates online transparency in Congress is now available.

Bluey is director of the Center for Media & Public Policy at The Heritage Foundation and maintains a blog at RobertBluey.com. He authored the "Citizen Journalism Access" chapter for the Open House Project. The full report is scheduled for release on May 8.

The OpenHouse Project (a project of the Sunlight Foundation) is discussing ways to "open up the House." It will suggest changes "where the internet and Congressional procedures come together" to identify areas where Congress can open up and allow all of us to have more information and access." It is a temporary working group designed to make recommendations to Congress on how to begin making the House of Representatives more open and facilitate communications.

Forthcoming Op-Eds will address: Legislation Database, Preserving Congressional Information, Congressional Committees, CRS Reports, Member Web-Use Restrictions, House Clerk's Office, and Coordinating Web Standards.

Google + state government documents

"Google and four U.S. states have partnered to improve the amount of data Google indexes from their Web sites and makes available to users of its search engine."

"The search giant helps four states get online data indexed and helps create custom search engines for government Web sites."

Here are some different perspectives on the work Google is doing with state governments:

Another remix of speeches

Although this is not, strictly speaking, government information, it is about the presidential race and is an interesting complement to US Presidential Speeches Tag Cloud (more on that here: remix: US Presidential Speeches Tag Cloud).

This is another good demonstration of how interesting things can be done with information that is easily re-usable.

As a contrast, one can not remix documents like SUPERINTENDENT OF DOCUMENTS POLICY STATEMENT 301, because it is an image of text, not text.

Why Privacy and Confidentiality Are Important

Don Wood at Library 2.0 has a good, short post on Why Privacy and Confidentiality Are Important.

He notes that "For libraries to flourish as centers for uninhibited access to information, librarians must stand behind their users' right to privacy and freedom of inquiry."

If libraries do not have copies of digital government information, users will be faced with retrieving that information from government-controlled web servers where there is no guarantee of privacy (Will GPO guarantee user privacy? Can it?). If libraries want to flourish as centers of uninhibited access to information, they need to have copies of digital government information.

[Editor's note 2/18/09: The link to Don Wood's blog appears to be dead. However, I found a copy in the Internet Archive. The link above has been changed to go to the archived copy of the page.]

New Remix: Federal Register Searches by RSS

Thanks a bundle to Steven M. Cohen who Twittered about this new item we've added to our remixes page:

Justia Regulation Tracker - This free service takes Federal Register data and provides the ability to create RSS feeds of search results. The search gives you more options than the GPO Advanced Federal Register Search because the Justia search gives you agency dropdown choices and the regulations abstracts appear on the results pages. Justia is led by former CEO and FindLaw co-founder Tim Stanley. They make their money from advanced web services to lawyers, but provide free basic legal info to the public.

More information on Justia and this new service can be found at http://blog.librarylaw.com/librarylaw/2007/04/feeding_the_rea.html.

This is a perfect example of a service that couldn't be started if GPO implemented a two-tiered model of information access - Free but restricted access at Depository Libraries and fee access for vendors wishing to reuse government information.

But how will GPO be able to sell government information if people who obtain this public domain information republish for free with better searching and alert tools than GPO? We don't think they can without restricting the no-fee information model in some way. So we at FGI think they shouldn't try.

Finally, if this serendipity by Twitter intrigued you, drop by and friend me at http://www.twitter.com/dcornwall

Major Podcast Directory Update

Now representing 27 states, our Government Podcasts directory has been updated with all the state and local government podcasts we could find. We also had help from librarian Amanda Stone.

Please look over the revised directory and send us anything, especially state and local gov't podcasts that we missed.

Remember our criteria:

1. It must have audio or video produced either by a state agency, local government or an elected official.

2. It must be hosted on a government server (.orgs/.edu that clearly id themselves as a gov't body ok).

3. The agency must put the CAST in podCAST by having an obvious way to subscribe to the podcast feed (RSS, iTunes, etc). Posting static audio files and expecting people to manually download files one by one won't cut it.)

Thanks for checking this list for us!

Two new LOCKSS news items

If you're trying to understand how LOCKSS works and why anyone would want their own copies of government data when GPO/Google/[Your Third Party Here] will keep it safe and free forever, check out these two recent news items from LOCKSS:

  • LOCKSS Team featured by Library of Congress (04/24/07) Pioneers of Digital Preservation on the Library of Congress' web site features an overview of the LOCKSS program.
  • Presentation at CNI (04/17/07) Vicky Reich and David Rosenthal talked at the CNI meeting in Phoenix, AZ. Vicky gave an overview of the status of the CLOCKSS program, and David talked on Can We Afford To Preserve Large Databases?.

The LC page not only demonstrates that LOCKSS can be a trusted and TESTED partner in digital preservation, but also explains an excellent plain English explanation of how the system works.

David Rosenthal's CNI powerpoint touches on the non-technology reasons why information solely in the hands of the government is at risk, especially his slide 18:

Example: Insider Attack

● Political interference (Hansen 2007):
– 2006 Earth Science budget retroactively reduced 20%
– ''One way to avoid bad news: stop the measurements!''
– Suppose the data itself turned out to be ''inconvenient'' ...

● Remove it (e.g. EPA pollution database)
● Alter it?

● Independent replicas essential
– Independently administered in different jurisdictions
– Mutually audited so they're tamper evident

The rest of the presentation is a good though slightly technical primer on performance requirements for digital preservation and the need for further research. Also has some scary things to say about RAID.

More evidence that no ONE system, not even a Future Digital one, is enough to safeguard America's government information. No system is safe from its parent - particularly when that parent is so reluctant to fully fund information access and preservation.

Google data goes missing

Google glitch loses user data, By Dan Goodin, The Register 26th April 2007.

Google users found that "settings and data they've amassed over months have suddenly gone missing from their personalized homepage."

Although this story doesn't directly pertain to government information it has a lesson for us.

While the loss of individual preferences is hard on the individual, it is not the same as losing public information that we want accessible forever.

We hope that Google will be able to restore these user preferences, but the lesson here should not escape us:

Over the years, the many free services offered by Google and its competitors have become indispensable to many of us, but they also bring to mind the old adage that we get what we pay for. And Google's personalized homepage isn't the only such service to show signs of untrustworthiness....

As libraries (and government agencies) increasingly rely on Google to provide essential access to government information (e.g., Google to index government deep web? and Google begins to offer full-text scanned government documents and Agencies are working with Google to boost rankings and increase traffic and Cabbage Statistics, a microcosm of our selection decisions?), we should all be asking ourselves if this is an adequate infrastructure for permanent access. Every time we think "I don't have to [index, catalog, fill-in-the-blank-service] because Google will..." we should ask ourselves what we'll do if Google doesn't, or fails, or changes it's services.

If we do more (e.g., Cabbage statistics and google bombs), we will at least be able to use our own systems and Google and its competitors and its successors.

An even more important question is, do we know what google does and how it does it? What do they index and how do they rank? What gets ranked high and what gets ranked low? How deeply do they index a given web site? Are these the same decisions that we would make? Have they changed what they do since last week? Are they the right decisions for our users? Does there need to be an alternative that we control and can explain to our users and thus better help them?

Technical Requirement for Digital Deposit

A recent thread on the govdoc-l mailing list is about digital deposit. See Digital Deposit by Janet Fischer, 26 Apr 2007 and digital deposit by James [R.] Jacobs, 27 Apr 2007.

Thanks to Janet for bringing this up and to James for the helpful links to technical information on digital deposit.

I'd just like to add two thoughts:

1. I believe that it is best to think about digital deposit in much the same way we think about paper deposit: every library will be different.

We shouldn't be looking for a one size fits all solution in the digital world anymore than we expect any two depositories to be identical. The technical requirements that any given library comes up with will depend on the level of service and collection profile that the library chooses.

We should be thinking of services and collections first and technology second. We should not be trying to shoehorn our service and collection decisions into an abstract technological solution. Nor should we assume that every digital depository will have to meet the same requirements that an OCLC, CDL, NARA, or FDSys will meet.

I can, for example, easily imagine a small depository with a slow or intermittent Internet connection and little or no online services selecting a few essential titles and putting them on a stand-alone public PC so that users can easily use those titles when in the library even if the network is down or slow or the originating site unreachable. And, at another extreme, I can see a large library, which already has some digital collections online and accessible over the web, adding government information to its collection and integrating government and non-government sources together so that its users do not have to go to two different interfaces or sites to find the information they need.

You can probably easily expand these simple examples too your own situation and see where digital deposit will fit into your existing collections and services -- or collections and services you are planning.

2. I think it is equally important to emphasize to library management and to your technical support people that there are different technologies for implementing any given collection and service plan. Again, I believe that we should not be looking for a one-size-fits-all technological solution -- even for similar collection and service plans. For example, one library might choose to use LOCKSS to implement online collections, another might use its institutional repository software (e.g., DSpace, EPrints, Greenstone, etc.), another might use content management software, and another might integrated documents into its existing webspace by uploading them to the same server that hosts its existing html documents. The point is that there are different technical ways to implement the same collection and service plans.

I hope this helps and others will contribute to this thread. If you have specific suggestion or solutions that you anticipate using, please share your stories!