Digital deposit
2009 Fall DLC Meeting: "Demystifying Digital Deposit: What It Is and What It Could Do for the Future of the FDLP"
Submitted by blakeley on Tue, 2010-03-16 08:04.At the Fall 2009 Depository Library Council (DLC) meeting in Arlington, VA, James A. Jacobs and I (Rebecca Blakeley) introduced attendees to the concept of "digital deposit" that maps out the pieces of the FDLP cloud and what it could do for the future of the FDLP. Our slides and notes are available for you to view and download online.
- Add new comment
- 12 reads
MetaArchive publishes guide to distributed digital preservation
Submitted by jrjacobs on Wed, 2010-02-24 15:38.Please check out the new book published by the MetaArchive Cooperative called A Guide to Distributed Digital Preservation. It's both timely and handy.
[Full disclosure: the book is primarily about LOCKSS and mentions specifically the project that I'm working on LOCKSS-USDOCS, FGI and I receive no compensation from the sales of the book.]
Announcement: publication of A Guide to Distributed Digital Preservation
Authored by members of the MetaArchive Cooperative, A Guide to Distributed Digital Preservation is the first of a series of volumes from the Educopia Institute describing successful collaborative strategies and articulating specific new models that may help cultural memory organizations work together for their mutual benefit.
This volume is devoted to the broad topic of distributed digital preservation, a still-emerging field of practice for the cultural memory arena. Replication and distribution hold out the promise of indefinite preservation of materials without degradation, but establishing effective organizational and technical processes to enable this form of digital preservation is daunting. Institutions need practical examples of how this task can be accomplished in manageable, low-cost ways.
This guide is written with a broad audience in mind that includes librarians, archivists, scholars, curators, technologists, lawyers, and administrators. Readers may use this guide to gain both a philosophical and practical understanding of the emerging field of distributed digital preservation, including how to establish or join a network.
Readers may access A Guide to Distributed Digital Preservation as a freely downloadable pdf and/or as a print publication for purchase. Please visit http://www.metaarchive.org/GDDP to download or order the book.
******
The MetaArchive Cooperative provides low-cost, high-impact preservation services to help ensure the long-term accessibility of the digital assets of universities, libraries, museums, and other cultural memory organizations. In addition to preserving members' digital content in a distributed digital preservation network, the Cooperative also offers consulting and education services to institutions that seek training in digital preservation planning, policy creation, and implementation, including setting up and running Private LOCKSS Networks (http://www.lockss.org).
For more information, please contact Program Manager Katherine Skinner (katherine.skinner@metaarchive.org).
- jrjacobs's blog
- Add new comment
- 482 reads
Demystifying Digital Deposit: What It Is and What It Could Do for the Future of the FDLP
Submitted by jajacobs on Tue, 2009-10-27 08:00.At the Fall Depository Library Council Meeting in Arlington, VA, Rebecca Blakeley gave a presentation that she and I wrote on "Demystifying Digital Deposit: What It Is and What It Could Do for the Future of the FDLP." Although a PDF version of the presentation is available on the FDLP web site, it only has the slides, not the text of the presentation.
The complete, original PowerPoint file, including the "speaker notes" with the complete text of the presentation, is available on slideshare:
- Demystifying Digital Deposit: What It Is and What It Could Do for the Future of the FDLP, by Rebecca Blakeley and Jim Jacobs, Fall DLC Meeting - Arlington, VA, October 20, 2009.
- jajacobs's blog
- Add new comment
- 943 reads
Digital Deposit
Submitted by jrjacobs on Sun, 2009-10-18 07:17.This page is to collect information on digital deposit.
- Add new comment
- 353 reads
Kentucky Shows How to Publish and Deposit Government Documents
Submitted by jajacobs on Fri, 2009-09-04 14:41.The State of Kentucky has developed a best-practices manual for publishing -- and depositing -- government documents digitally.
- Kentucky State Government Publications Handbook, Kentucky Department for Libraries & Archives. Frankfort, KY (June 2008) Edition 1.0
The Kentucky Department for Libraries and Archives (KDLA) has been the official repository for Kentucky state agency publications since 1958
...Kentucky state agencies are required to send their publications to KDLA. The Public Records Division (PRD) and State Library Services (SLS) at KDLA work together not only to provide access to the valuable information contained in state agency publications, but also to preserve the publications for future generations.
The handbook says that "Electronic publications should be forwarded in Adobe Portable Document Format (PDF)." We would love to see all U.S. government publications in PDF format deposited in FDLP libraries. It would be a great first step toward digital deposit of all government information.
- jajacobs's blog
- Add new comment
- 646 reads
Comment on article: Depository Library Program in 2023
Submitted by jajacobs on Mon, 2009-08-31 12:19.A recent article reports on a survey of ARL library directors and their vision of their libraries' roles in the depository library program:
- The Federal Depository Library Program in 2023: One Perspective on the Transition to the Future. Peter Hernon and Laura Saunders. College and Research Libraries, July 2009, Vol. 70, No. 4). [In the interest of academic discussion and openness, we've posted a copy of the article on FGI.]
The survey asked directors to choose among several future scenarios for the FDLP and their role of provision of government information. The authors are explicit about their intentions saying that "the study neither directly addresses whether the depository program itself will exist fifteen years hence nor offers a vision of what future will emerge after 2023." They also note that the survey explicitly focused on the question "how many libraries want to remain in the depository library program and what role do they intend to play?" This focus predetermines the outcome of the survey somewhat. It doesn't tell us what FDLP should be or how libraries could have a role in ensuring the long term, free access to government information. Instead we get a lot of information about what directors worry most about: money and resources.
The authors point out that no other study has systematically surveyed library directors for their perspective on the FDLP. This is particularly interesting given the rumors, gossip, and scuttlebutt going around about how many university librarians want to get rid of their depository collections, don't trust their depository librarians, and see depository status as costing more than it is worth.
The study reinforces some of those stereotypes and provides some evidence that some ARL library directors do indeed think that way. Sample quotes: "Several directors look forward to a time when they can 'dump the print.'" "Although some directors believe they have 'forward-thinking' documents librarians, others feel the opposite. As the director of a regional depository explains, 'the more that directors know about the program and a library's responsibilities, the less likely documents librarians can bluff about the legal obligations and seek to maintain the status quo.'" "The burden of participation in the program, including that of cost, is a recurring theme." "The directors I talk to all want to get rid of the [depository] collection and drop out of the program as soon as possible."
Not surprisingly, the directors who think that way are apparently part of the minority (13% that chose "scenario 1") who believe that libraries should withdraw from the depository program or that the program will simply wither away.
What the survey documents for the first time, however, is how much value ARL directors put in government information and digital collections. Many of the directors see government information as essential to their academic communities and have serious concerns about how to ensure its availability. Fully half the respondents envision (scenarios 3 and 4) some sort of digital collections as part of their responsibility -- either in partnership with GPO or separate from GPO if GPO does not provide adequate leadership.
While this survey is very interesting and provides much food for thought, it is far from the final word on the future of the FDLP, GPO, or government information. It leaves many questions unanswered and raises other questions. For example:
- The survey's use of the term "digital depository" is confusing at best and misleading at worst. One of the "scenarios" presented to directors in the survey describes "digital deposit" as the library providing "a digital feed of government information resources to its Web site, thereby becoming a portal for access to e-government information resources. The library receives, but does not create, digital content." We wonder how directors interpreted this? Did they think that "receiving" digital content meant getting copies of digital files that they would keep in a digital collection? Or did they think that "providing a feed" and "becoming a portal" was a passive job of pointing to content at GPO or elsewhere? The article does not make this clear and we would have to guess that directors may not have provided responses that we can interpret consistently. (And, we would have to ask the authors, whose work we respect, why they chose the outdated word, indeed the outdated concept, "portal"? Does anyone really believe that users want or will use "portals" anymore?)
- Another term that is used in a confusing way in the article (at least I was confused by it) is the term "dark archive." We normally associate this with digital archives such as Portico (which archives digital copies of journal articles but is "dark" because no one can see the articles unless a particular kind of event -- such as a publisher going out of business -- allows the archive to make articles available). In this article, the authors use "dark archive" in that sense but they also use it to refer to print collections that have copies of last resort. Was this confusing to the surveyed directors? Did different interpretations skew their answers?
- Some of this confusion is evidently apparent to the authors. When they analyzed the directors' comments, they discovered that there was some "imprecision" by directors in choosing a scenario. Some were unable to place their institution fully in one of the provided scenarios. There were many reasons for this, but it makes it harder for us to interpret and understand the results.
- The survey did not specifically present a scenario of real digital deposit in which GPO sends (i.e., deposits) authentic digital files to depository libraries. As noted above, the survey focused on two different but related questions: who wants to remain in the FDLP and what role do they intend to play. Combining those two questions may have further muddied the responses and left out options (e.g., true digital deposit).
- One theme mentioned several times in the article is the need for a shared digital archive of digitized materials similar to the JSTOR model. To me, this seems to be an indication that the directors value digital information, see a need for a trusted repository in addition to GPO, and would support shared responsibilities for such an archive. This should spark some good discussions at the next DLC meeting.
- The survey seems to perpetuate and even reinforce misleading concepts about the permanent availability of digital government information. Although the authors acknowledge that "government entities often do not retain all resources permanently on their homepages, and content can be difficult to find and can be subject to removal, redacting, or alteration", they also passively quote directors who say they will rely on search engines and other libraries and government web sites to provide government information for them. There are certainly some libraries (even among ARL libraries) that will not have large digital collections of government information, but the survey does an injustice by passing along these comments without follow up questions to those directors about who will ensure access.
- Another questionable idea that came out of the survey was about staffing. Several directors said "they would cease to employ separate, dedicated government documents librarians. They assume the specialized knowledge will be passed to reference librarians." Shouldn't ARL directors be thinking about the need for new skills to manage digital deposit and digital preservation and digital access to locally held files? Shouldn't they be concerned about the special skills that will be needed to locate government information and provide reference service for it if they do not have a collection that they control?
In summary, the article provides much to discuss and good opportunities for further research. It also provides some clear evidence that the rumors that ARL directors want to dump their depository collections and drop their depository status are well founded, but that these directors are in the minority. Most ARL directors highly value government information and are looking for smart, efficient ways to ensure long term access to digital collections.
- jajacobs's blog
- Add new comment
- 892 reads
"Chat with GPO" Session on Authentication
Submitted by blakeley on Thu, 2009-05-14 09:23.Today I attended the "Chat with GPO" OPAL session, which focused on authentication and authentication for FDLP partners.Ted Priebe, GPO's Director of Library Planning & Development (LPD) and Lisa Russell, the Manager of LPD's Content Management unit presented material and answered questions.
Basically, LSCM wants to partner with Federal Depository Libraries and find ways to authenticate content hosted by the FDL partners. The digital signatures of authentication will indicate partnership with the FDL institution and the contact information for that institution. This is great news, especially for those FDLs also interested in hosting digital content in partnership with GPO.
The authentication session is archived on the GPO OPAL site.
- blakeley's blog
- 4 comments
- 1220 reads
Response to Public Printer
Submitted by jajacobs on Thu, 2009-04-16 20:13.We at FGI would like to thank Robert C. Tapella, the Public Printer of the United States, for his response to our comments on his letter to President Obama regarding open government.
Mr. Tappella's response has some information that should be very encouraging and heartening to the depository library community. It also leaves some issues troublingly unaddressed.
Bulk Data Access to Legislative Information
First, it is wonderful to know that GPO is working with the Library of Congress, Congressional Research Service, the Law Library of Congress, and the Senate and House on the issue of access to bulk legislative data!
That news is important and significant. It is also very encouraging because it marks a new direction for dissemination of government information. Taken to its logical conclusion, this would mean that we will have a new route to obtaining government information. No longer will we be limited to information presented as web pages through government-built interfaces. No longer will we have to hope that web scraping will find all the information we want to gather or preserve. Raw information -- once locked in the dark web of government databases -- will be, potentially, available for libraries and others to download and repurpose.
Unfortunately, we can't look for this right away. Congress has only asked for a report, not action. The report itself is due "within 120 days of the release of Legislative Information System 2.0." Presumably that is a reference to a new version of the LIS that is currently only available within the legislative branch. I have not seen an announcement of a date for the release of a new version of the LIS, so it is not clear even when we can expect the report.
Nevertheless, it is certainly good to hear directly from Mr. Tapella that the task force working on this report will develop "a position on access to bulk data" and even intends to "work on making bulk data accessible."
It is somewhat ironic that this long, drawn-out process itself demonstrates the need for bulk data access. Although there have been calls for bulk data access for years, it literally took a legislative directive to get GPO and LOC and CRS to take the tentative steps they are taking now: to "develop a position" and "work on" the problem. Such passivity and long delays are, perhaps, inherent in a large, bureaucratic system, but they are crippling when it comes to keeping up with technological changes. This demonstrates why it is essential for the government to provide easy, free, reliable access to the raw information of government: doing so will enable others -- who can more quickly adopt new technologies -- to provide better access to that information faster than the government can.
What about Non-Legislative Data?
It is also unfortunate that the task force is only looking at bulk delivery of legislative information. Will it take another legislative directive to get GPO to "develop a position" on bulk access to other data? See Bulk Data Downloads: A Breakthrough in Government Transparency (by Tim O'Reilly, O'Reilly Radar, Mar 4, 2009) for a short list of other other data for which we need bulk access.
Will GPO Support Collections in FDLP Libraries or Just Backups?
Mr. Tapella's statement does not indicate that GPO has yet grasped the difference between 'backups' and digital deposit. GPO's focus is apparently still on making sure that its own collection is functional rather than facilitating digital collections in FDLP libraries. The "geographically dispersed content repository" described by Mr. Tapella is only "our backup" designed to ensure GPO's "continuity of operations" if GPO's own data repository becomes inoperable. This is a good and necessary feature but it is only a backup for GPO and has nothing to do with digital deposit.
Although Mr. Tapella points out that FDsys supports "repositories that can accept data much like libraries today accept tangible publications distributed from GPO," it seems clear that this generic design is intended as providing "backups" and would require "enhancements" to include bulk data access. This is a GPO-centric way of thinking. This is still a long way from GPO having a "position" on digital deposit and even further from "working on" making it possible.
Until GPO understands that it needs to support digital deposit so that FDLP libraries can build their own digital collections with their own functionality, FDLP libraries will not be partners in preservation and access; they will be, at best, little more than a backup for GPO.
APIs are not Digital Deposit
Mr. Tapella repeats the advantages of APIs, but fails to address the need for digital deposit. Providing APIs is not the same thing as providing digital deposit. As we have said in our original comment APIs are not magic. Each is a design for access and the product of choices made by the designer. Each has its own constraints built in. But don't take our word for it; read what developers say about the constraints of using existing government APIs:
- Extracting Government Spending Data via Talend and Ruby into CouchDB, by Rohit Amarnath, Full360 (04/11/2009).
- Improve databases, By Joshua Tauberer, The Hill (06/12/07).
We love APIs! We think they are great! We want more! We are so very glad that GPO will support them at last! But, please, Mr. Tapella, understand that APIs and a web site are only two of the three parts of a complete access system. Bulk data access is essential and we'd like to hear that GPO is planning for it now.
OAIS is not Digital Deposit
We are so very happy that FDsys is based on OAIS. It is something we have long advocated. But, again, Mr. Tapella, please understand that telling us about your preservation system and your intentions to preserve information does not reassure us that everything will be preserved and freely available to everyone forever. As we pointed out in our original comments, regardless of your intentions and the quality of your system, GPO may not always have the funding, resources, or mandate to provide free, permanent, public access to all government information and we therefore cannot rely on it alone to do so. And no single digital archive or repository can ever be as secure and safe as multiple archives. We need digital deposit to guarantee preservation and free access.
The GPO-centric approach to preservation and access is like a medieval town that stores all of its grain in one barn. When lightening strikes, the whole town goes hungry. In this day and age of $200 terabyte hard drives, peer-to-peer networks, and successful preservation systems like LOCKSS, it concerns us greatly that you still don't understand the need to have many collaborators working together to ensure long-term, free, public access.
Good News?
There are a couple of sentences in Mr. Tapella's reply that make me optimistic that GPO is on a path to change and does understand this need for collaborators. He says:
We need help from you and others in the community to help define future enhancements to access and data distribution. We see APIs as a one of the methods to provide advanced access tools, and realize that this is just one part of the ultimate solution.
To me, this says two important things: First, "data distribution" is on the GPO agenda, at least nominally; second, APIs are just one part of a bigger, ultimate, solution. This gives me hope for more. I hope I'm not reading too much into this.
See also:
- Bulk data and Legislative Information 2.0.
- Congress’ legislative information systems: THOMAS and the LIS by Jeffrey C. Griffith, Government Information Quarterly 18.1 (2001): 43-60. Apr 16, 2009
- Congressional Research Service Products: Taxpayers Should Have Easy Access, Project on Government Oversight, February 10, 2003.)
- Comparison of Legislative Resources on GPO Access and Selected Government and Non-Government Web Sites
- Remixes: Creative uses of free government information
- OpenHouse Project Op-Ed on Databases
- jajacobs's blog
- Add new comment
- 948 reads
Army Journal removal highlights need for digital deposit
Submitted by dcornwall on Tue, 2009-03-31 19:13.According to Secrecy News, the Army has pulled the unclassified Military Intelligence Professional Bulletin from the open web:
The former MIPB website states that “The MIPB is now being hosted on the Intelligence Knowledge Network (IKN). (AKO account required).” AKO (Army Knowledge Online) accounts can only be obtained by military and contractor personnel.
The MIPB, which is unclassified, has long been available on the world wide web and has even been sold commercially. Back issues from 1995 to 2005 are available online from the FAS website, though no longer from the Army.
In addition to being sold commercially, this journal was also distributed through the Federal Depository Library Program until 2006, according to its entry in GPO's Catalog of Government Publications at http://catalog.gpo.gov. After 2006, it went online only and access was through a PURL.
As of today, that PURL directed folks to the takedown page. Libraries that depended on the "official repository" of the Army for post-2005 issues were out of luck. If these digital copies had been instead deposited to depository libraries, access might have gone on unhindered. Unless the Army had asked GPO to have depositories destroy their electronic archives of MIPB. But even then, the fact that multiple digital copies of MIPB existed would have triggered GPO's public process laid out in ID 72: Withdrawal of Federal Information Products from GPO’s Information Dissemination (ID) Programs. With that public process and the fact that prior issues were widely available, I think that the MPIB archive would have been safe. Instead, the Army as "The Official Repository" has made the online archives go away until FAS gets its FOIA request responded to.
Or maybe it will come sooner. The fact that MPIB had a PURL indicates that GPO may have been archiving it. But can they now post their copy of the archive? Do they need to consult the Army first? What if the Army says no?
Has anyone contacted GPO Help on this issue yet? What kind of a response have you gotten? Be sure to be kind to GPO as the decision on documents withdrawals rests with the agency. In this case, the Army. Don't blame Ric Davis if the Army nixes an FDLP restoration of the 2005-2009 MIPB archive.
It's cases like these where decisions are made with a flip of the switch without a public process that makes us wary of the Official Single Repository of Federal Publications, no matter who the federal agency is. Sunlight and good decision making require digital deposit outside the federal government.
- dcornwall's blog
- 1 comment
- 1510 reads
Public Printer's Letter to President Obama Regarding Open Government
Submitted by jrjacobs on Mon, 2009-03-23 10:42.The Public Printer recently released GPO's letter to the President regarding open government (PDF) (Robert C. Tapella, Public Printer, March 9, 2009). Since it specifically mentions FreeGovInfo, we feel the need to comment and contextualize a bit.
On the one hand, it's great that GPO is reaching out publicly to offer infrastructural help with the government transparency initiative. We're happy to assist in any way we can. We hope FDLP libraries will join GPO in such efforts.
On the other hand, FGI has always argued for a geographically dispersed system of local, official digital repositories, so we cannot support GPO’s goal 1 to make FDsys the official repository for Federal Government publications -- unless it includes a network of distributed repositories modeled after the Federal Depository Library Program (FDLP). What we can support is FDSys as the official distribution channel for federal government publications.
It's not a trivial distinction. "Repository" means that GPO assumes sole responsibility for preservation, a role not specified in legislation. "Distribution channel" means GPO continues its solid century and a half record of distributing information to other institutions which will continue their solid century and a half record of preserving government information for future use while making sure it remains freely available over the internet. Since digital deposit is currently #2 on The Sunlight Foundation's Our Open Government List (OOGL) of top ideas for the President's open government initiative, we can only assume that the public -- or at least those that are most interested in government transparency -- agrees that a geographically dispersed system is a key ingredient in government transparency.
We also believe it is important in discussions of transparency to plan for preservation of and long-term access to information. If, in concentrating on short-term access and on information-as-service, we fail to consider long-term access and instantiation of information for long-term preservation, we will inevitably lose information -- and that would be bad for transparency.
Incomplete Access
We commend and support GPO for building APIs into FDSys. It is heartening and encouraging to see that GPO is publicly and officially proclaiming that "access" means more than providing a web site. But APIs and a web site are only two of the three parts of a complete access system. GPO has yet to acknowledge or even mention the third part of access: the provision of unfiltered bulk data access to government information.
A GPO web site can provide a human-friendly interface for the public and APIs can provide a computer-program-friendly way of querying, fetching, and using information. But, even taken together, these two access points provide only the government-approved, government-designed, government-hosted view of government information.
The problem with these government-only views of government information is that they are limited. No single provider (government or non-government) can provide unlimited access points or views or interfaces.
APIs are not magic. Each is a design for access and the product of choices made by the designer. Each has its own constraints built in. For example, an API might be tied to a particular agency or department, which would limit cross-agency utility. Or an API might be generalized to work across agencies or departments and thus lose rich access to agency-specific information content or structure.
One way to overcome these limitations is for the government to provide bulk data access. This means allowing the public to download raw content in bulk. Where web sites provide one "page" at a time and APIs can provide one or many "facts" at a time, bulk data access provides the raw information so that users can build their own collections, interfaces, and APIs.
This could improve access in ways that GPO could never hope to do all by itself. Imagine, for example, an agricultural library building a digital collection that contains agricultural reports, data, and audio visual content from the The Department of Agriculture, the EPA, the SBA, and NOAA combined with reports, maps, and GIS data from state and local government agencies and other content from its own institutional repository or university press. Then imagine that specialized digital collection having its own state-specific, agriculture-specific API and web site and bulk data access. Then imagine that these repositories are part of the rapidly expanding cloud and you get a sense of a rich govt information ecology.
Such scenarios are possible, but only if GPO and other government agencies make raw content easily, freely available in bulk for use and re-use and re-purposing. Providing only government web sites and government APIs without bulk data downloads and the ability for others to build collections for specific or general purposes will provide only a tiny fraction of open usability and transparency that we could have. There is nothing standing in the way of this happening today except the will of government agencies to make it happen.
Incomplete Preservation
The Public Printer's letter glosses over the problems of long-term access and preservation.
Let's be as clear as we can: we cannot and should not rely solely on GPO for long-term preservation and free access. The shift to digital does not change the methodology for long-term preservation and access. On the contrary, the tenuousness of digital information means that a distributed methodology is even more vital.
We cannot rely solely on GPO because the GPO Electronic Information Access Enhancement Act of 1993 does not even mention permanent access, nor does it guarantee that access will always be free. Indeed, the law specifically allows GPO to charge for access and even for use of its "directory" of information. The law also covers only "appropriate publications distributed by the Superintendent of Documents" -- effectively excluding huge bodies of born-digital information from the scope of what is GPO is allowed to handle. Regardless of GPO's intentions, there is no existing legislative mandate for GPO to provide free, permanent, public access to government information and we therefore cannot rely on it alone to do so.
We should not rely solely on GPO because no single digital archive or repository can ever be as secure and safe as multiple archives, libraries, and repositories. Even if GPO had a legislative mandate to provide permanent preservation and access (which it does not), and even if anyone could guarantee that GPO would always get adequate funding so that it never had to withdraw anything or charge for access for anything (which no one can), it would still be impossible to guarantee that GPO would never lose any information. The nature of digital information is that it can easily be corrupted, altered, lost, or destroyed. It can become unreadable or unusable without constant attention. Relying on any single entity is simply not as safe as relying on multiple organizations. It is more than a truism that Lots of Copies Keep Stuff Safe -- safer than backups and "mirror sites." But this is about more than redundant copies. It is also about relying on different organizations because they have different funding sources, different constituencies, different technologies, and different collections. No single digital collection can ever be as safe as multiple, reliable digital collections.
The good news
The good news is that there are existing organizations that can start working on this right away. There is nothing standing in the way of GPO and the existing FDLP libraries from implementing a digital depository system in which GPO enables FDLP libraries to download bulk data and build local digital collections.
There are existing technologies to facilitate this. The U.S. Government Documents Private LOCKSS Network is preserving "harvested" government information. Peer-to-peer (P2P) networks (like Napster and BitTorrent) have become increasingly popular because more and more people and some businesses have begun to realize that "distributed files" equals faster access and better preservation. (A geographically dispersed system of local, official digital repositories would be, for all intents and purposes, a P2P network.) Open source software for building digital repositories is widely available and increasingly easy to use.
Summary
APIs are good. They are a necessary part of adequate government information access. But digital distribution is also essential because only digital distribution will enable FDLP libraries and others to build new APIs, to de-ghettoize government information by better integrating it with non-government information, and to ensure long-term, free, public access and usability of government information.
- jrjacobs's blog
- 6 comments
- 4252 reads
The Intersection of Education, Technology, and Open Content
Submitted by jajacobs on Thu, 2009-02-26 19:29.In a couple of recent posts, Lev Gonick, who is the CIO at Case Western Reserve University, has noted that we have "an educational economy that makes information abundant confronting an educational delivery system built for a time in which information was scarce."
- How Technology Will Reshape Academe After the Economic Crisis by Lev Gonick, Chronicle of Higher Education blog, "The Wired Campus" (February 24, 2009).
- A Small Proposal at the Intersection of Education, Technology, and Open Content, by Lev Gonick, Chronicle of Higher Education blog, "The Wired Campus" (February 26, 2009).
His description of the educational economy and the educational delivery system struck me as analogous to the situation we face with government information. We live in an environment where government information is abundant and gains value by being distributed and reusable. In this environment it is incredibly inexpensive to distribute information, yet governments too often treat it as if it were scarce and expensive to deliver.
It is ironic, for example, that GPO refuses to deposit ninety percent or more of government information in FDLP libraries because it is digital (SOD 301, Superintendent Of Documents Policy Statement, "Dissemination/Distribution Policy for the Federal Depository Library Program" Effective Date: June 1, 2006) and then wonders why libraries find it hard to justify being a depository library.
Imagine a system closer to what Gonick describes. Imagine a system that recognizes that digital information is different from paper and ink information: both more valuable (because it is more easily used and re-used) and less expensive to distribute. Imagine an approach that is a more modern, more appropriate response to digital information than what we have now. Go further and imagine what the depository system would look like if it adopted the vision that Carl Malamud proposes.
Carl says all government information should be available in three ways (all for free):
- as bulk data for downloading and repurposing;
- through an API for querying, retrieving, embedding in other web sites;
- as better official web sites aimed at end users.
(Carl outlines these in his interview with Timothy M. O'Brien February 24, 2009, and in his Rebooting the Federal Register document):
Lev Gonick expands on what truly open information could mean to communities. He contrasts "the largely proprietary learning economy that exists now" with the new environment of "more and more open educational resources." He sees these open resources as creating new opportunities that were not available when we could only rely on proprietary, closed, scarce information resources.
Goncik's specific ideas actually sound a lot like the kind of collaborative, civic-centered services that John Shuler has long advocated and is describing here. Specifically Goncik describes a "a university-led 'connected cities' project" in which "we could invite different communities within our cities (children, schools, professionals, unions, educators, artists, elected officials, and so forth) to communicate with others in this new connected Web." He continues:
They might share oral histories and multimedia presentations about their communities with one another. Or they might participate in formal educational and research exchanges. Scientists could discuss research on sustainability, for instance, in ways that connect to high-school students seeking to learn about ecology and the economics of recycling. We can and we should leverage our universities’ ability to create powerful networks of technology and learners to create binding partnerships that matter.
The oceans that once separated us are now made smaller by the technology that we have helped invent and deploy. Deepening the linkages within and between our communities and across our cities is a 21st challenge worthy of great universities.
But this is not just about technology enabling sharing. It is also about having something to share. In order to do this, of course, we will need to guarantee free access to robust, preservable, re-usable collections of information. We could do that by hoping that GPO will always get the funding to do it for us and that it will do it right and meet all the needs of all communities equally well forever. We could hope that the government (GPO, OMB, Congress, etc.) will never privatize information or withdraw information, or alter information. Or, we could take on the task ourselves as depository libraries in the FDLP by demanding digital deposit. Then we could begin building digital collections for different communities-of-interest, world-wide. Libraries could then not only do interesting things with the information that they manage for their communities, but they could also facilitate others re-using the information.
- jajacobs's blog
- 4 comments
- 1940 reads
Obama’s Inaugural Speech: visualized, video-searchable
Submitted by jajacobs on Fri, 2009-01-23 10:24.President Obama's inaugural speech has generated some interesting examples of how technology can be applied to government information when the information is freely available for use and re-use and not locked into government databases or proprietary formats. It is a small piece of text with a lot of public interest and high visibility and, therefore, ripe for these kinds of demonstrations and experiments. Of course, to make use of the information, we have to actually have a copy of it. Imagine what would happen if all government information was actually distributed in open formats to libraries so that we could build collections that were index-able, search-able, visually browsable, and analyzable in interesting ways. Imagine freeing government information from its .gov silos and integrating it with non-government information in digital collections created for particular virtual communities of interest. Imagine the future of digital collections that are as easily re-usable as this small bit of text.
Check out these examples!
- Inaugural Words: 1789 to the Present, New York Times. "A look at the language of presidential inaugural addresses. The most-used words in each address appear in [an] interactive chart..., sized by number of uses. Words highlighted in yellow were used significantly more in this inaugural address than average."
- Visual of the Inaugural Address, ProPublica. [Compare this to the NYT version. Stop words matter!]
- Search Inside Obama’s Inaugural Speech. Delve Networks. "We invite you to experience President Obama’s inaugural speech using our search inside technology. To do this, type what you’re looking for into the player searchbar above. A heatmap will show you where information related to your topic appears in the speech. You can move your mouse over the heatmap to see the matches. Click to jump to that place in the speech."
- jajacobs's blog
- Add new comment
- 1027 reads
Affirmative Disclosure of Government Information
Submitted by jajacobs on Mon, 2008-12-08 13:51.John Wonderlich, a Program Director of the Sunlight Foundation and a great friend of libraries, has posted some useful suggestions over at The Sunlight Foundation Blog:
I really like John's concept of "affirmative disclosure." I think we could go even further by explicitly addressing the problems of long-term preservation caused by the shift to e-government.
I am starting from the assumption that society needs a reliable way to preserve an accurate, complete historical record. Unfortunately, the systems we have in place today makes it difficult, and in some cases impossible, to guarantee that we will preserve a record that is either complete or accurate.
Consider, for example, the recent case where researchers at the University of Illinois discovered that the White House removed original documents from its web site, altered them, and replaced them with backdated modifications that appear to be originals but are not.
Also consider the project of the Library of Congress, the California Digital Library, the University of North Texas Libraries, the Internet Archive and the U.S. Government Printing Office to try to capture web pages of the current administration by performing a "comprehensive crawl of the.gov domain."
These examples illustrate the problem of preserving the historical record.
The first shows how the historical record can easily be lost and altered (intentionally or unintentionally -- it doesn't matter which) by lack of accurate metadata (dates, versioning). The second shows the sad state of current preservation: the best record we will have of the government web will be a single, incomplete snapshot of the end of an eight year administration. (Harvesting is imperfect and incomplete: links can break, embedded content can be lost, databases can prohibit or inhibit crawls of their content, and crawls can only save a snapshot of dynamic sites.)
In essence, the government has made a major change in information policy by changing the technology of information dissemination and has done so without really examining the implications of the change or even acknowledging that a policy has changed.
What was the policy change? In the old policy, the role of government was to collect and assemble and edit and create information and then instantiate it in publications and distribute those instantiations to the public. At that point the role of preservation was in the hands of libraries (mostly FDLP libraries) and archives. But, in the new policy, the government does not actively distribute, but "posts" information on web sites where it is subject to alteration and removal without ever being instantiated anywhere. It is up to the public, consumer groups, individuals, libraries, and special projects to identify when information is posted or changed and then attempt to preserve that information. While that may succeed sometime, the approach has two fatal flaws. First, it is ad-hoc and therefore will almost certainly be incomplete at best. Second, it puts the responsibility of instantiation in the wrong hands: not those who create the information (the government) but those who "discover" the information. The government essentially is renouncing its responsibility to actively, affirmatively create a preseveable instance of the information it creates.
While some agencies (e.g. GPO, EIA) are saying that it is now their role to preserve information, other agencies (e.g., NARA) are actually narrowing their role in long-term preservation (notice that NARA is not participating in the ".gov crawl" and says explicitly that "most web records do not warrant permanent retention").
So, let's explicitly expand the idea of "affirmative disclosure" to include "active deposit." By that I mean that the government should be required to actively inform and distribute to the public notifications (metadata) and documents (data) every time a "document" is created or modified or superseded. "Deposit" could be accomplished with technology (e.g., RSS, APIs, OAI and OAI-PMH, etc.) and should be required to include dates and version information.
This is the right way to do this because it recognizes the appropriate roles for the different participants in the life cycle of information: government agencies create information products that are preservable and libraries and others preserve those products outside the .gov domain.
- jajacobs's blog
- 2 comments
- 1272 reads
Digital Deposit: Lack of storage space is no excuse
Submitted by dcornwall on Mon, 2007-10-15 12:34.This past weekend I was at my local Costco and not one, but two brands of 1 Terabyte (1000 GB) drives selling at around $300. I also saw a 500 GB (1/2 T) drive for $130. All of the drives were USB friendly meaning you could take one off the shelf and plug it into a USB port and have all that memory available to you.
What can you store in a Terabyte? According to an FBI article on digital forensics, plenty:
"a terabyte is equivalent to about 250 million pages of text, which would stack 10 miles high if printed on both sides of the page."
Surely that's enough space for even smaller libraries considering telling the Government Printing Office that they would like PDF ("access derivitives") delivered to them based on their profiles.
I admit, space isn't the only issue. But it's the objection I've heard most often and I honestly believe that technology has taken it away.
- dcornwall's blog
- 1 comment
- 1299 reads
Cuts in LC budget threaten NDIIPP
Submitted by jrjacobs on Wed, 2007-09-12 23:45.And this is *exactly* why we need a distributed system of digital deposit, collection, preservation and access.
"Cuts Impact Digital Work At Library Of Congress", National Journal's Technology Daily, Sep 11, 2007 PMedition by Aliya Sternstein
Budget cuts this year and a paltry funding outlook for fiscal 2008 are frustrating digitization efforts at the Library of Congress, according to Library employees. Meanwhile, Democrats and Republicans disagree on how much money to supply the program in the future.
The National Digital Information Infrastructure and Preservation Program, established by Congress in 2000, devises means of finding, saving and providing long-term access to cultural resources that exist only in electronic format. But $47 million -- half of the program's funding -- was rescinded in fiscal 2007 to support other critical library programs.
- jrjacobs's blog
- Add new comment
- 1029 reads


Recent comments
2 days 5 hours ago
2 days 12 hours ago
3 days 23 hours ago
4 days 2 hours ago
5 days 10 min ago
1 week 6 days ago
3 weeks 2 days ago
3 weeks 2 days ago
3 weeks 3 days ago
4 weeks 4 days ago