privatization

Good news and bad news about UK GIS data

Today, some mixed good news/bad news about the availability of free public data in the UK. As we've noted here before (e.g., Privatized Data Woes in Britain and News from abroad: UK open statutes & RFID in Canadian coins and The Semantic Web + Government Information = Serendipitous Reuse) the British government sells limited-use licences to its GIS data on a cost recovery basis. Now, as part of a proposed national geoportal, the UK would "create a single point of entry on the web to data held by public bodies such as local councils, Ordnance Survey (OS), the British Geological Survey and the Environment Agency." But, as the story says, "A new system will make geospatial information available without charge - yet we'll still have to pay."

First, some very good news. Civil servants revealed last week that the British government has begun work on a system to make all the geospatial data it holds on the natural environment available for free inspection and re-use. Now the bad news. In this context, "free" means we will still have to pay to download much key data, especially if it is to be published or otherwise used commercially.

Privatized Data Woes in Britain

While FGI normally focuses on US government information policy issues, there is a conflict going on in the UK that mirrors some of the recent stories about public data being used by private companies in a privileged way, forcing the taxpayer to pay twice for their data.

An April 17, 2008 Guardian article titled A costly 2008 Domesday Book details how not one, but two British agencies contracted with commercial companies to post government compiled data. The result:

After seven years of legal wrangling, an official, complete and constantly updated list of addresses in England and Wales is about to become available for commercial use. The National Land and Property Gazetteer (NLPG), compiled from data supplied by local councils, is being promoted as the best list of property addresses since the Domesday Book.

Free data it is not. Although prices have yet to be finalised, the commercial firm hosting the service said this week it will cost between £15,000 and £20,000 a year. Profits will be shared among local authorities to help them keep data up to date.

The gazetteer is not the only address database on the market. The state-owned Ordnance Survey also offers addresses as part of its MasterMap digital geographical database of Britain.

Most of the article is about campaigns to free the data. In analyzing the roadblocks, they talk about issues that will be familiar to US readers:

"We would like to give it away free," says Nicholson. However, he says, local authorities are not going to give their work away when they have to pay for the use of postcodes from the Royal Mail's Postcode Address File. Neither can Ordnance Survey, which is required by the Treasury to show a return on its activities, and regards MasterMap as a key part of its revenue-generating portfolio.

We wish the Free Our Data campaign well.

This is probably a good to time to mention that what FGI objects to isn't the selling of data per se, but the selling of data that has already been compiled at taxpayer expense. If a private company wanted to raise its own venture capital, compile its own address list completely independent of government sources, we'd be all for it charging whatever the market could bear. But a private entity should not be allowed to be the sole, fee-based dispenser of information that has been compiled by government agencies using money confiscated through taxation. THAT's what we're against.

A comment on government contracts and harvesting

Over the past week, there have been some good conversations about government contracts to digitize government information and the National Archives decision to not conduct a web harvest or snapshot at the end of the current Administration. There is good news and bad news.

The good news
The good news is that NARA's decision was not nearly as bad as it appeared to be when it was first announced in a memo on March 27, 2008, which was circulated only to Federal records officers (see: The National Archives Is Quietly Destroying Millions of Documents). In a thoughtful post on its web site (National Archives and Records Administration Web Harvest Background Information, April 15, 2008, NARA; pdf version available), NARA outlines in detail the reasons why it would not conduct an end of administration web snapshot or harvest of Executive Branch websites nor require agencies to do so. The reasons, I think, are sound and in keeping with NARA's commitment to preserving information of historical value.

In addition, the NARA memo of April 15 makes explicit the fact that its decision and memo of March 27 do not apply to Presidential records or to records of the Congress. It says that "NARA will continue to conduct a web harvest of Congressional web sites" and that NARA "will also receive a snapshot of the White House website" noting that "Unlike Federal agencies governed by the Federal Records Act, the White House is governed by the Presidential Records Act, under which all Presidential records are treated as permanent and transferred to NARA for preservation at a Presidential Library."

The NARA "Background Information" document is also, I think, worth reading for its clear description of the shortcomings of web harvests in general. I think it is very useful for us to be reminded of these shortcomings to the extent that we believe we can rely on them as an adequate form of preservation.

In more good news, the NARA/TGN contract is not as bad as it could have been. I mentioned this in my earlier post here (The NARA/TGN contract as a bad precedent) and similar comments have been made in the useful and interesting thread over at ArchivesNext (NARA latest digitization agreement: One archivist's perspective). Merrilee Proffitt, of RLG, says in a comment there that the NARA model for contracts with third parties "actually comes out looking pretty good" when compared to the criteria described in the RLG paper Good Terms - Improving Commercial-Noncommercial Partnerships for Mass Digitization (by Peter B. Kaufman and Jeff Ubois, D-Lib Magazine, November/December 2007, Volume 13 Number 11/12).

The bad news
The bad news, as James pointed out this morning, is that the GAO contract for digitizing is very bad indeed (GAO *did* sell exclusive access to legislative history to Thomson West). Quoting Carl Malamud, James notes that GAO gets access to the digitized data but does not get a copy of its own; the rest of the government doesn't even get access to the data. The public is left with the option of going to GAO headquarters and paying 20 cents per page to copy paper! As Carl says, "This is one of those deals where the public domain got sold off."

This morning there was more bad news. Kate at ArchivesNext reports that the Citizens for Responsibility and Ethics in Washington (CREW) has a new report Record Chaos: The Deplorable State of Electronic Record Keeping in the Federal Government, that concludes "that the federal government is severely mismanaging its electronic records." CREW also says that a House Committee proposal to amend federal record keeping laws "is anemic and fails to make the substantial changes necessary to bring the federal government into the 21st century."

And even the good news is tempered by the fact that we have less than we could and are a long way from an even an adequate system of permanent preservation of digital information or a long-term solution to digitizing non-digital information. We will have to hope that the White House will deliver a snapshot of the White House web site and that the snapshot will be accurate and complete. The behavior of the White House with regard to electronic records and email does not make us optimistic. The NARA/TGN deal is better than the GAO/Thomson deal, but still leaves much to be desired and, as pointed out even by defenders of the deal, it is unlikely that we will ever have free, open, networked access to the digital information that TGN digitizes. That means the real effect of the deal is to privatize the information.

Comment
For me, the biggest disappointment in these latest developments is that librarians and archivists seem to be too willing to accept "good enough" and not willing enough to argue harder for "better." There are lots of people who have good reason to argue for less access, more fees, less privacy, and more control of information, but librarians and archivists should not be among them. I believe that we should not spend time making the case for the private sector; it is fully capable of making its own case. We should spend our time fighting for free, full, open, public access, usability of information, and long term preservation.

The primary mission of private sector companies is to make money, not to serve the public. They may serve the public as a by-product of making money, but no for-profit company will go to its owners and say "we are going to do the best thing for public access" without the qualification "that will make us money." Unfortunately "making money" often conflicts with public access. Politicians (and some bureaucrats) will argue for greater control of government information; some will argue for secrecy of government information on the one hand and privacy-invading policies on the other. Most government agencies do not have information access or long term preservation of their information as a primary mission and the exceptions are notable (e.g. LOC, NARA).

In contrast, the primary mission of many libraries and archives is to provide free public open access with long term preservation and usability. While others may have some of those pieces as secondary goals, few if any have them all. For many libraries and archives these goals are not just their primary mission but their defining characteristic.

While digitization and digital preservation are neither easy nor inexpensive, that doesn't mean that we have to pay any and all costs for them. The digital era should be making it possible to provide better access without giving up free use and reuse, without giving up open access, without turning over control to those whose primary mission is something other than free, open, public access and long term preservation. But increasingly we see a combination of politics and economics leaving us with contracts that trump copyright and fair use, with "access" being negotiated at almost any cost (including loss of control), with DRM technologies that prohibit easy (or any) reuse, and with privacy protections being deprecated or even ignored. Even in the case of the NARA/TGN contract that is legally "better" than the GAO/Thomson contract, we are left with the effect of two-tiers of access and network access being essentially privatized and fee-based.

I believe that librarians and archivists should be pushing the boundaries and insisting for more and better, not accepting some benefits by negotiating away the big benefits we could be getting in the digital age. This is particularly important for government information that is in the public domain. If we can't make this work for public information that is not copyrighted, how will we be able to do so for information that is?

I'm not arguing for a perfect, ideal world that is impractical to achieve. I am suggesting that we should fight for everything we can get. We should celebrate when we make inroads with a contract (like NARA/TGN) that is better than the others (like GAO/Thomson) but we should do so by committing to doing better next time. We should not accept this as "good enough" -- because it is not and we can do better next time. In fact, every time we accept a less-than-perfect deal as "good enough," we make it a little harder to make a better deal next time. We lower the bar if we accept "good enough" and stop trying to achieve better. We should not take the time to convince ourselves or the public that this is as good as we can get; we should take that time to admit to the limitations and trade offs and to commit to doing better next time.

There is lots written these days about "the future of libraries" and "the role of libraries in the digital age" and many people openly wonder if there is a place for libraries at all. I think there are several places where libraries have a unique role to play in society and the areas of digitization and digital access and preservation are important ones.

We need to make the case for the public; for free, open, public access; for long-term preservation and usability; for public accountability in the control of information; for reader privacy. Librarians and archivists have a unique role in doing that. In doing so, we will face an uphill battle and trade offs, but we should never lose sight of our unique role in society. We should never cheapen our professions by making the case for less (there are plenty of people to do that). We should always make the case for more. We will not always succeed and we will have to make trade offs. But we should always do so in the context of staking out a territory that is different from the private sector and those who are willing to get less. We should stand up for rights that others are not willing to fight for. We must fight for it when there are so many forces aligned against free, open access.

I'd like to see us emulate Carl Malamud and CREW and Brewster Kahle more and do less of making excuses for TGN and Thomson.

GAO *did* sell exclusive access to legislative history to Thomson West

A few weeks ago, Daniel had a great post, "GAO/Thomson-West Contract Raises Questions" in which he expanded on a Boing Boing post "Did the US gov't sell exclusive access to its legislative history to Thomson West?" and analyzed the Thompson-West contract with the GAO for digitizing 20,597 legislative histories of most public laws from 1915-1995. Today, Carl Malamud got an answer to his FOIA request to the GAO seeking access to the digitized images of those legislative histories. I'll let Carl tell it in his own words:

Well, the answer is now a definitive yes, that data has been sold down the river and is out to sea.
Public.Resource.Org sent in a FOIA request to GAO on this topic seeking access to the scanned data. Today's letter answering our FOIA request spells out the bad news. Turns out the GAO doesn't even get the data, they simply are given an account on Thomson's service. The rest of the government doesn't get access to this data, and the public is invited to stop by the GAO headquarters and pay 20 cents per page to copy paper.

This is one of those deals where the public domain got sold off ... GAO gets a bit of convenience by having their stuff scanned for them, but they gave up way more than they got in the deal, and the public (including government workers and public interest groups who need to consult this data) lost big-time.

Carl has put up his paper trail explaining the story. Here's the link to the Scribd group with the full paper trail on this issue, and here's the link to last week's response from the GAO.

This perfectly exemplifies the problems we see with government agencies entering into contracts with private companies to digitize public domain materials (see for example "NARA/TGN contract as a bad precedent"). We have no problem with government agencies contracting with private companies to digitize government information. The problem as we see it is that so many agencies seem ignorant of the fact that privatizing access to said digitized public domain information actually limits access in the long run.

The NARA/TGN contract as a bad precedent

A comment (Digitization Contract expands access to public records) posted here last week to a posting (Yet another digitization contract limits free access to public records) about the NARA/TGN contract to digitize certain materials at NARA, said that the contract "does not limit access to public records" and that "This is a definite win for the public."

I want to to take the opportunity to address the arguments made in that comment and enumerate some of the problems that I see with the contract and ones like it. In brief: (as James pointed out) while contracts like this one are attractive in the short run to some people because they do provide some access that we do not now have, in the long run they are bad ideas because our short term, limited gains result in long-term net losses to free public access to public information. Even people who relish the short term gains should be concerned about the long-term net losses.

The good things about the Contract
Let me begin by noting that there are many things about this contract that are good and that reflect, I think, the fact that government officials have learned from past mistakes. Examples of the good things in the contract are: the inclusion specific technical specifications, the right of NARA to interrupt processing when necessary to provide reference service and public access to the materials, the "non-exclusive" nature of the contract, the fact that TGN must provide free online access to the Digitized Materials in all NARA locations, the fact that NARA does not transfer permanent control or ownership of the materials to TGN, and the five year limitation on TGN's sole use of (some of) the digital copies.

The bad things about the Contract
But there are, I believe, several things wrong with the contract -- things that result in a net loss to the public rather than a net gain.

  1. The "enhancements" provided by the contract are fee-based and therefore explicitly and implicitly limit use and impose two-tier access.
  2. The contract promotes access over control. For the public to have "access" to public information content without the ability to use and reuse it "enhances" with one hand while it diminishes with the other. Enhancing access at the expense of control is a net loss for the public.
  3. The so-called expansion of access obscures the limitations on free public access to public information that deals such as the NARA/TGN deal impose. For example,
    • NARA gives TGN "the rights to and the exclusive and unlimited right to use the Digitized Materials and all metadata created for the electronic databases for five years."
    • There is nothing in the contract that requires the information that TGN dispenses during the five years to be usable or reusble by the public and we must assume from the language of the contract that it certainly does not intend to grant such rights for use of public information to citizens.
    • The agreement gives TGN veto over disclosure of information about the agreement itself (section 4.4 of the Agreement).
    • The agreement creates a category of "confidential information" that is exempt from disclosure (Section 4.2). This includes "designs or styles, trade secrets, inventions," and even "know-how." This is an example of the government not only condoning "closed access" principles over "open access" principles, it is contractually requiring NARA to do so.
    • NARA is giving TGN the right to use NARA trademarks, which will obscure the difference between TGN and NARA itself thus blurring for the public the free-public access of government information with private-company-fee-access. The contract even requires NARA to link from its own Catalog (ARC) to the TGN site, thus effectively turning NARA into an advertiser and promoter of TGN. It is not clear to me that this requirement of NARA to link to TGN will end after five years.
  4. It is not true, as the comment claims, that "The digitized copies of these records become freely accessible at all NARA reading rooms." Rather, the contract explicitly places limits on use of the digitized images for 5 years -- even in the reading rooms. These limitations include: "production for a fee of digital images" and, the permission to provide DVDs or CD-ROMs "for sale to the public." Even those distributions by NARA must include "license restrictions" that "will limit their use to prohibit resale, distribution or republication." (Section 1.4a [emphasis added])
  5. The contract does not, as the comment claims, make "the digitized copies of these records freely available to everyone after five years at no cost to the taxpayer." Indeed the wording of the contract explicitly gives NARA the right after five years "to sell" the digital content. In addition, the contract does not remove restrictions on materials digitized from microform after 5 years. (See Section 1.4b)
  6. The argument that any "enhancement" is good -- even if it imposes restrictions and two-tier access is often used by the private sector as a rationalization for privatization of government information. The battles over privatization of public information have a long history and, with the shift to digital information, we face new battles. I believe that the push for privatization -- particularly because of the costs involved in digitization -- means that we should be more cautious, not less cautious or cavalier, about promoting, facilitating, or encouraging contractual arrangements such as the NARA/TGN deal that grant special rights to the private sector or blur the difference between the private and public sectors.
  7. Contracts such as this one set a precedent for creating two-tier or fee-only access to public information. When we allow the government to make excuses for failing to provide free public access by claiming that we have no choice and that this is better than nothing, we lower the bar for the next contract -- and the next.

It is a bigger problem than this one contract
We at FGI have no argument against the private sector repackaging and adding value to public information -- as long as the information itself is freely available to everyone to use and re-use. When everyone has access to the raw content, then we will all be able to repackage and add value to public information, we will all have free access and the ability to "enhance access."

But when any contractual agreement or system (private-sector or governmental) locks the raw information away from citizens or charges a fee for that information, then such systems and contracts, by definition, rest control of the information from the public and consolidate that control in a government agency or private sector company.

This problem of control exists not just with contracts such as the NARA/TGN contract. It also exists for information such as the Congressional Record and the Federal Register (which are "free" one-page-at-a-time, but cost thousands of dollars a year for a subscription; see http://bookstore.gpo.gov/collections/eproducts.jsp). It exists for Congressional Research Reports, which the government does not make available to the public except for those that leak out of government control or that private vendors provide for a fee (see http://opencrs.com/ and Inexplicable anomaly By Leslie Harris and Matt Stoller).

I am sure that some will argue that it is still possible (because of the non-exclusive nature of the contract) for the government or someone else to re-digitize these materials and make them freely available in the future. But that argument is the opposite of the argument for negotiating this contract in the first place. If we have to have a contract like this now, if this is the best we can do, if the government cannot afford to digitize these materials today, why should we assume that this will change in the future if those materials are already digitized? The practical result of contracts like this is that they will make it harder, not easier for these materials to ever become freely available to the public.

In summary, this is a big problem, not just a problem of this one contract. We are grasping short-term, good-enough expediency at the expense of long-term free public access. As citizens and librarians, we should not lower our standards for free public access to public information by accepting less than full, free, public access.

GAO/Thomson-West Contract Raises Questions

Thanks to an alert from a dedicated but shy reader, our attention has been focused on a story on boing-boing titled Did the US gov't sell exclusive access to its legislative history to Thomson West? This story has links to documents relating to this deal requested by the redoubtable Carl Malamud. I took the time to read/skim through the contract documents and found this interesting section:

Taken from "Attachment A, Statement of Work" from the contract between Thomson West and GAO, posted at http://www.scribd.com/doc/2299358/Contract-Between-Thomson-West-and-GAO

Background: Since its inception in 1921, the US Government Accountability Office has compiled 20,597 legislative histories of most public laws from 1915-1995. These histories, spanning the 64th-104th Congresses, are currently being used onsite in the GAO headquarters Law Library in paper or microfiche format by GAO staff. On rare occasions other federal government employees are allowed onsite access to the paper or microfiche copies of these histories. Because of its historical and research value the legislative history collection shall be digitized to preserve the integrity of the files and improve the searchability of this valuable information resource.

Two years ago, GAO began a pilot project to convert a small number of GAO legislative histories from paper and microfiche formats to digital format. Since then 243 histories have been digitized using in-house resources and will be made accessible to GAO staff only through a web-based database on the GAO Intranet. The 243 histories consisting of 1,214,438 pages were randomly selected and include some of the largest histories in the collection. These histories shall also be re-scanned as part of this digitization contract.

This sounds like a major goldmine of information that really hasn't been shared with other parts of the government, let alone the public. It also sounds like GAO tried to do some of this work on its own but found it unviable. So left to itself, the information wouldn't contained in the paper files wouldn't be available to anybody. So I'm not surprised it went looking for a partner. But I am surprised and concerned that they went with a commercial partner when the GAO office is within driving distance of a number of major universities and when public-spirited organizations like the Internet Archive and Public Resource might have been happy to come up with a solution to provide this taxpayer-funded information at zero cost to the taxpayers and either zero or minimal costs to GAO. Conceivably there might have been some way for the Government Printing Office to incorporate this into GPO Access, although that certainly would have been at some cost to GAO unless Congress was willing to make an appropriation for this purpose. But any Congress that claims to be committed to strong public access should be willing.

Were alternatives to in-house digitization or wholesale privatization pursued? If not, why not?

Long time readers of FGI know that most government information is considered public domain and also subject to Freedom of Information Act requests. So what's to stop Carl, Internet Archive, or some other public minded group from exposing this rich trove of legislative histories to the public which were taxpayer funded to begin with? According to the GAO, plenty:

Taken from "Attachment A, Statement of Work" from the contract between Thomson West and GAO, posted at http://www.scribd.com/doc/2299358/Contract-Between-Thomson-West-and-GAO

FOIA Requirements: While GAO is not subject to the Freedom of Information Act (FOIA), GAO has regulations (4 CFR Part 81) that follow the spirit of FOIA. The paper or microfiche copies of the legislative histories (and possibly the PDF copies of the "GAO Materials" section) would be available for public inspection and copying. However, under GAO's public disclosure regulations, GAO charges a per page copy fee. Accordingly, any extensive copying would be expensive and the quality of the copies, for many of the histories would be poor.

I assume this was put into the contract to assure Thomson-West their investment would be secure from public-access zealots who have the idea that the American people should only be charged once instead of twice for government information. But the paragraph raises two important questions that I hope someone in Congress will ask GAO:

1) On what rational basis would you charge a per-page fee on the 1,214,438 pages that have already been digitized? Running a backup tape isn't the same as hand copying files. GAO should be directed to immediately release that database at zero cost unless they can carefully and believably document actual copying expenses including staff time. But a per page copy for PDF files isn't credible.

2) When GAO says "quality of the copies, for many of the histories would be poor", are they saying that the quality of copies would be poor just for the public or for Thomson-West as well? The first reading suggests a deliberate effort to sabotage no-fee public access, while the second reading suggests that Thomson-West customers will be paying a high price for lousy duplication. Neither option seems particularly fair.

I think I speak for all of us at FGI when I say that while digitization for greater access is a laudable goal, wholesale privatization without a careful, public examination of other, more citizen-friendly, alternatives is not acceptable. If you agree, please ask your Members of Congress to direct GAO to take a second look at this contract and facilitate no-fee access to this valuable set of legal materials.

400 Years of NARA War Records Now Online at Commercial Site

Ancestry.com is making available "more than 90 million U.S. war records from the first English settlement at Jamestown in 1607 through the Vietnam War's end in 1975. The collection includes the names and gravestone details of 3.5 million deceased U.S. soldiers, including 2,000 who died in Iraq." Users can pay $155.40 a year for unlimited access.

The records came from the National Archives and Records Administration (NARA) and include "37 million images, draft registration cards from both world wars, military yearbooks, prisoner-of-war records from four wars, unit rosters from the Marine Corps from 1893 through 1958, and Civil War pension records, among others." Ancestry.com spent $3 million to digitize the military records.

Budget constraints and a long list of unfinished priorities have limited federal efforts to make roughly 9 billion public documents available online, said National Archives spokeswoman Susan Cooper.

"In a perfect world, we would do all this ourselves and it would be up there for free," she said. "While we continue to work to make our materials accessible as widely as possible, we can't do everything."

Syndicate content