The Government Printing Office in a press release today announced a success story in the use of the Application Programming Interface (API) for Federal Register. It is certainly interesting and illustrative of how an API can be used to deliver information to a particular community of interest, but I think you may also find it unexpectedly unusual. A researcher used the FR API to create a tracking system for polar bear protection documents.
GPO AND OFR SHOWCASE OPEN GOVERNMENT SUCCESS STORY
WASHINGTON-The U.S. Government Printing Office (GPO) and the National Archives' Office of the Federal Register (OFR) report a success story from the Application Programming Interface (API) for FederalRegister.gov. GPO and OFR introduced the API in August 2011, enabling information technology developers to create new applications for regulatory information published in the Federal Register. A researcher utilized the API to create a tracking system for polar bear protection documents. The API tool automatically grabs Federal Register items that mention polar bears from 1994 to present, displays the items in a formatted list with browsing capabilities, and links back to the full text on FederalRegister.gov.
Link to Polar Bear Feed: http://polarbearfeed.etiennebenson.com/
"This is another example of how GPO and OFR continue to find ways in achieving the goal of making Government information more transparent and giving users the ability to adapt Federal Register data to their own needs," said Public Printer Bill Boarman.
"We are thrilled to see the use of the API source material to develop a live feed on the subject of polar bears. This is precisely how we hoped this information would be used when we made it available to the public. We couldn't be more gratified," said Director of the Federal Register Ray Mosley.
The print and online versions of the Federal Register are the official daily publication for rules, proposed rules, and notices of Federal agencies and organizations, as well as executive orders and other Presidential documents.
UK: National Archives Releases Public API & Government Licensing Policy Extended Making More Public Sector Information AvailaSubmitted by garyprice on Tue, 2011-09-06 15:45.
From Computer Weekly:
The National Archives [UK] has made details of 11m records available through an application interface it published today as part of an ongoing programme to get more official records online.
The API allows anyone to search for and retrieve the metadata that describes records in the archive in XML format. The data can then be used without restriction or charge. But the archive, which is simultaneously an executive agency of the Department of Justice and a government department in its own right, continues to charge £3.50 per document to retrieve actual records online.
More Info on INFOdocket or Direct from Computer Weekly
Also from the National Archives (UK)
Data.gov To House New APIs, By Dawn Lim, TechInsider (06/21/10).
A series of new application programming interfaces - tools that facilitate interaction between datasets and other software programs - will make it easier for developers to play and interact with the content on Data.gov, the online repository of federal information and a cornerstone of the open government initiative.
But those are just the preliminary steps to establishing a self-running ecosystem that will convert raw government data into valuable content and interesting applications, a White House technology expert said last week at a government IT forum.
The New York Times announced today the release of version 3 of its "Congress API."
- Introducing Version 3 of the Congress API, By DEREK WILLIS, New York Times Open Blog (February 23, 2010).
The Times gets raw data directly from the U.S. House and Senate Web sites and Thomas, the Library of Congress public web site with legislative information. It parses and stores the data on its own servers and provides an API (Applications Programming Interface) to the data so that programmers can query the data, get results, and easily provide the data to users in interesting and unique ways.
This is an excellent example of treating government information as "data" rather than as "documents." Rather than having a PDF file that lists all members of Congress (a document-centric way to deal with information), a database of all members of Congress with an API front-end to the database (which treats information as data) allows developers to build software that allows users to get a list for a state or district. When combined with other information such as voting records, bill-sponsorship, party affiliation, and so forth, users can get the information they need assembled in response to a specific information request. To the user the end result looks like a "document" but the document is built dynamically from the data.
Developers at the NY Times and elsewhere are using this to create interesting web sites and applications. See, for example, Your Government - The Oregonian, and Congress Speaks, and the Times' own Represent, which combines Federal and State information to allow users to find elected representatives in New York City.
The New York Times has a nifty interface that programmers can use to access information about Congress, the Congress API. Recently, they have added improvements including bill cosponsorships, a new members response, and member voting record comparisons. Read about it here:
- Congress Returns, As Does an Improved Congress API, By Derek Willis, New York Times Open Blog, (September 2, 2009).
Also see: NY Times Announces the Congress API.
If you are going to the ALA Annual 2009 Conference in Chicago next week, please come to the "ALA Unconference" where I will be leading a broad discussion on Friday, July 10th from 11:10-12:00 on the library's role in current & emerging trends of civic engagement, transparency, preservation and access to Government information. The supporting materials and presentation will be linked in the Unconference wiki.
Also, please come to the LITA BIGWIG Social Software Showcase to discuss and learn about Government Information Mashups! I will be presenting on this topic and would love to have you help out and/or join in on the conversation! The presentation will be posted on their website but the face to face portion of the BIGWIG Showcase presentations will take place Monday, July 13th from 10:30am - 12:30pm in the McCormick Convention Center West, Room W-184.
Mr. Tappella's response has some information that should be very encouraging and heartening to the depository library community. It also leaves some issues troublingly unaddressed.
Bulk Data Access to Legislative Information
First, it is wonderful to know that GPO is working with the Library of Congress, Congressional Research Service, the Law Library of Congress, and the Senate and House on the issue of access to bulk legislative data!
That news is important and significant. It is also very encouraging because it marks a new direction for dissemination of government information. Taken to its logical conclusion, this would mean that we will have a new route to obtaining government information. No longer will we be limited to information presented as web pages through government-built interfaces. No longer will we have to hope that web scraping will find all the information we want to gather or preserve. Raw information -- once locked in the dark web of government databases -- will be, potentially, available for libraries and others to download and repurpose.
Unfortunately, we can't look for this right away. Congress has only asked for a report, not action. The report itself is due "within 120 days of the release of Legislative Information System 2.0." Presumably that is a reference to a new version of the LIS that is currently only available within the legislative branch. I have not seen an announcement of a date for the release of a new version of the LIS, so it is not clear even when we can expect the report.
Nevertheless, it is certainly good to hear directly from Mr. Tapella that the task force working on this report will develop "a position on access to bulk data" and even intends to "work on making bulk data accessible."
It is somewhat ironic that this long, drawn-out process itself demonstrates the need for bulk data access. Although there have been calls for bulk data access for years, it literally took a legislative directive to get GPO and LOC and CRS to take the tentative steps they are taking now: to "develop a position" and "work on" the problem. Such passivity and long delays are, perhaps, inherent in a large, bureaucratic system, but they are crippling when it comes to keeping up with technological changes. This demonstrates why it is essential for the government to provide easy, free, reliable access to the raw information of government: doing so will enable others -- who can more quickly adopt new technologies -- to provide better access to that information faster than the government can.
What about Non-Legislative Data?
It is also unfortunate that the task force is only looking at bulk delivery of legislative information. Will it take another legislative directive to get GPO to "develop a position" on bulk access to other data? See Bulk Data Downloads: A Breakthrough in Government Transparency (by Tim O'Reilly, O'Reilly Radar, Mar 4, 2009) for a short list of other other data for which we need bulk access.
Will GPO Support Collections in FDLP Libraries or Just Backups?
Mr. Tapella's statement does not indicate that GPO has yet grasped the difference between 'backups' and digital deposit. GPO's focus is apparently still on making sure that its own collection is functional rather than facilitating digital collections in FDLP libraries. The "geographically dispersed content repository" described by Mr. Tapella is only "our backup" designed to ensure GPO's "continuity of operations" if GPO's own data repository becomes inoperable. This is a good and necessary feature but it is only a backup for GPO and has nothing to do with digital deposit.
Although Mr. Tapella points out that FDsys supports "repositories that can accept data much like libraries today accept tangible publications distributed from GPO," it seems clear that this generic design is intended as providing "backups" and would require "enhancements" to include bulk data access. This is a GPO-centric way of thinking. This is still a long way from GPO having a "position" on digital deposit and even further from "working on" making it possible.
Until GPO understands that it needs to support digital deposit so that FDLP libraries can build their own digital collections with their own functionality, FDLP libraries will not be partners in preservation and access; they will be, at best, little more than a backup for GPO.
APIs are not Digital Deposit
Mr. Tapella repeats the advantages of APIs, but fails to address the need for digital deposit. Providing APIs is not the same thing as providing digital deposit. As we have said in our original comment APIs are not magic. Each is a design for access and the product of choices made by the designer. Each has its own constraints built in. But don't take our word for it; read what developers say about the constraints of using existing government APIs:
- Extracting Government Spending Data via Talend and Ruby into CouchDB, by Rohit Amarnath, Full360 (04/11/2009).
- Improve databases, By Joshua Tauberer, The Hill (06/12/07).
We love APIs! We think they are great! We want more! We are so very glad that GPO will support them at last! But, please, Mr. Tapella, understand that APIs and a web site are only two of the three parts of a complete access system. Bulk data access is essential and we'd like to hear that GPO is planning for it now.
OAIS is not Digital Deposit
We are so very happy that FDsys is based on OAIS. It is something we have long advocated. But, again, Mr. Tapella, please understand that telling us about your preservation system and your intentions to preserve information does not reassure us that everything will be preserved and freely available to everyone forever. As we pointed out in our original comments, regardless of your intentions and the quality of your system, GPO may not always have the funding, resources, or mandate to provide free, permanent, public access to all government information and we therefore cannot rely on it alone to do so. And no single digital archive or repository can ever be as secure and safe as multiple archives. We need digital deposit to guarantee preservation and free access.
The GPO-centric approach to preservation and access is like a medieval town that stores all of its grain in one barn. When lightening strikes, the whole town goes hungry. In this day and age of $200 terabyte hard drives, peer-to-peer networks, and successful preservation systems like LOCKSS, it concerns us greatly that you still don't understand the need to have many collaborators working together to ensure long-term, free, public access.
There are a couple of sentences in Mr. Tapella's reply that make me optimistic that GPO is on a path to change and does understand this need for collaborators. He says:
We need help from you and others in the community to help define future enhancements to access and data distribution. We see APIs as a one of the methods to provide advanced access tools, and realize that this is just one part of the ultimate solution.
To me, this says two important things: First, "data distribution" is on the GPO agenda, at least nominally; second, APIs are just one part of a bigger, ultimate, solution. This gives me hope for more. I hope I'm not reading too much into this.
- Bulk data and Legislative Information 2.0.
- Congress’ legislative information systems: THOMAS and the LIS by Jeffrey C. Griffith, Government Information Quarterly 18.1 (2001): 43-60. Apr 16, 2009
- Congressional Research Service Products: Taxpayers Should Have Easy Access, Project on Government Oversight, February 10, 2003.)
- Comparison of Legislative Resources on GPO Access and Selected Government and Non-Government Web Sites
- Remixes: Creative uses of free government information
- OpenHouse Project Op-Ed on Databases
For reasons of serendipity, this seems to have been API week at FGI. We just keep posting stories about APIs. So, here is one more. It's not new, but still a good one:
- Library Application Program Interfaces, By Roy Tennant, TechEssence, July 17th, 2008.
Application Program Interfaces (APIs) are structured methods for one software application to communicate with another. APIs allow programs to interoperate and share data and services in a standard way. Here is a list of library-related APIs that library developers may find useful.
The People’s Data, by Christopher Werth, NEWSWEEK, From the magazine issue dated Mar 9, 2009.
"Government should make data openly available and then let outside talent reimagine how it can be used online."
See also: Realizing Transparency Through Federal Government APIs, by Andres Ferrate, ProgrammableWeb, March 4th, 2009.
And speaking of APIs, I just noticed a post on govdoc-l that the St Louis Fed is now providing APIs for FRED (Federal Reserve Economic Data) and ALFRED (ArchivaL Federal Reserve Economic Data). Here's more information on their API. I hope our programming friends will check out FRED and ALFRED as there's a TON of data there, some going back to 1927!
From the St Louis Fed programmer:
"The FRED API accommodates any programming language that can parse XML and communicate with our servers using HTTP. The FRED API is based on the REST web service architecture. REST leverages familiar web technologies. Like a website, the FRED API uses HTTP to receive requests and send responses. Also like a website, the FRED API uses URLs to specify requests. This web service differs from a normal website by sending XML instead of HTML. HTML is a visual medium that's not always strictly formatted and flexible enough for arbitrary data structures. XML allows custom tags and relationships among tags."
This is *exactly* how govt agencies should building their Web/data services. Thanks St Louis Fed!!