61 Days to Government Information Liberation
Just got finish reading an article from the upcoming Sunday edition of the New York Times -- "If You Liked This, Sure to Love That" which talks about the public contest Netflix is running to improve the accuracy of the search engines that recommend movies to their users. Here is a section from the story that describes the problem and the prize --
"THE “NAPOLEON DYNAMITE” problem is driving Len Bertoni crazy. Bertoni is a 51-year-old “semiretired” computer scientist who lives an hour outside Pittsburgh. In the spring of 2007, his sister-in-law e-mailed him an intriguing bit of news: Netflix, the Web-based DVD-rental company, was holding a contest to try to improve Cinematch, its “recommendation engine.” The prize: $1 million. Cinematch is the bit of software embedded in the Netflix Web site that analyzes each customer’s movie-viewing habits and recommends other movies that the customer might enjoy. (Did you like the legal thriller “The Firm”? Well, maybe you’d like “Michael Clayton.” Or perhaps “A Few Good Men.”) The Netflix Prize goes to anyone who can make Cinematch’s predictions 10 percent more accurate."
Deeper in the story is this tidbit --
"IT USED TO BE THAT if you wanted to buy a book, rent a movie or shop for some music, you had to rely on flesh-and-blood judgment — yours, or that of someone you trusted. You’d go to your local store and look for new stuff, or you might just wander the aisles in what librarians call a stack search, to see if anything jumped out at you. You might check out newspaper reviews or consult your friends; if you were lucky, your local video store employed one of those young cinéastes who could size you up in a glance and suggest something suitable."
And then this --
"Cinematch has, in fact, become a video-store roboclerk: its suggestions now drive a surprising 60 percent of Netflix’s rentals. It also often steers a customer’s attention away from big-grossing hits toward smaller, independent movies. Traditional video stores depend on hits; just-out-of-the-theaters blockbusters account for 80 percent of what they rent. At Netflix, by contrast, 70 percent of what it sends out is from the backlist — older movies or small, independent ones. A good recommendation system, in other words, does not merely help people find new stuff. As Netflix has discovered, it also spurs them to consume more stuff."
The implications for government information library service seem, to me, profound. Automated trust? Where could we go with this when it comes to that sense of trust the informs the best part of librarianship and the community of users that rely on our institutions. I know some libraries are using aspects of this kind of recommendation automation ... but I love the notion of using the social software tools in such a way to help people find more stuff they might in which they might be interested. I know there is a huge gap between selecting movies and TV shows based on likes and dislikes and what we do as government information librarians when we explain large or small complex policy/legal connections. But just as Jim points out about the inherent necessity of collaboration embedded in librarian practice and theory (and we will continue to agree to disagree about the centrality of possession in that mix) -- I can only dream of an government information search algorithm that picks and chooses its way among the complex of relationships embedded in government information.
If you like this regulation on natural gas, then you might want to consider this one.
I know this happens at a very "structural" level in the Federal Register and Code of Federal Regulations that link the regulations through citations and their foundational public laws. And I know that, in a very real sense, subject headings and other authority records do this at a kind 19th century linear approach. But it seems to me that I spend most of my flesh and blood library time explaining these connections to our users rather than seeing any evidence that they grasp these library connections intuitively.
How can we make our library intuition more transparent? In some ways we are flesh and blood alogrithims -- which gets to another part of the story that delights me, when the contest participants sharpen their mathematical tools --
"As the teams have grown better at predicting human preferences, the more incomprehensible their computer programs have become, even to their creators. Each team has lined up a gantlet of scores of algorithms, each one analyzing a slightly different correlation between movies and users. The upshot is that while the teams are producing ever-more-accurate recommendations, they cannot precisely explain how they’re doing this. Chris Volinsky admits that his team’s program has become a black box, its internal logic unknowable."
Which has always been the challenge of teaching student librarians about the art of reference work -- there is this black box quality to how we know what we know and where to search for relevant information.
See you on Day 60.











Ideas for Recommendations
Hi John,
Your points about how we can steer people to other items of interest is well taken. One thing that probably could be done quickly by a vendor or a skilled library hacker would be simply to actually display titles whose subject headings matched more than half of the item you were viewing. It would be sort of crude but I think partly effective and something that can be done in the short term.
There is at least one recommendation engine in the non-profit world and that's from LibraryThing. Each book's records contains some recommendations for other books. See http://www.librarything.com/work/193319 for an example.
Sometimes though, it seems like a flesh and blood person would be best. For example, we sometimes get requests for Executive Orders. Many of these requested concern land withdrawals or other land matters. And they almost always consist of a textual, legal description of the property without a map. A recommendation engine might miss that a patron really wants to see the area under consideration. A librarian can check other resources for maps that might make the boundaries clearer.
Keep up the good work! I'm impressed with how you keep finding fresh topics.
------------------------------------ "
And besides all that, what we need is a decentralized, distributed system of depositing electronic files to local libraries willing to host them." -- Daniel Cornwall, tipping his hat to Cato the Elder for the original quote.
More Gov Docs Needed in LibThing
Speaking of LibraryThing...I need to spread the word more about the gov docs group I started at LibraryThing. We need to get more gov docs in LibraryThing and I'm trying to find covers for them to upload into their records. I just want to get them in the public eye more, but it is a lot of effort. I'll have to make a post about it here sometime soon! Here is my account. I really need to upload more "new and interesting" docs!
Centrality of Possession - Why it Matters
Hi John,
While I don't think I'll convince you, I wanted to comment on this part of your post:
The reason centrality of possession is important is that she who possesses makes the rules. If government information products are only available from government controlled servers, then access can be revoked at any time for any reason without notice. Fee walls can be introduced using the regulatory process.
If the storage of government information products is contracted out to the private sector, then licensing agreements can supersede the public domain.
If government information products continue to be placed in libraries as one place among many, then they will be in the hands of people with a structural commiment to open access. We're not hoarders and have little incentive to hoard. We also cannot be embarrassed by government reports that paint the current administration (whoever they are) in a bad light. So we'll continue to make things available.
------------------------------------
"And besides all that, what we need is a decentralized, distributed system of depositing electronic files to local libraries willing to host them." -- Daniel Cornwall, tipping his hat to Cato the Elder for the original quote.
Centrality of Possession -- a response
Daniel: I offer my thoughts on your post here. I do not think we will have much problem filling the remaining 59 days with aspects of this conversation. Let's call out to others to join in! We are a beautiful duet; but an orchestral effort is necessary!
Post new comment