Via Twitter, I heard about the Biodiversity Heritage Library's relatively new OpenURL Resolver, announced in their blog about a month ago. More specifically, I head about Matt Yoder's new Ruby library, rubyBHL, which exploits the BHL OpenURL Resolver to provide metadata about items in their holdings and does some additional screenscraping to return things like links to the OCRed version of the text.
In typical fashion, I've ported Matt's library to Python, and have released my code. pybhl is available from my site, PyPI, and Github. Use should be fairly straightforward, as seen below:
>>> import pybhl
>>> importpprint
>>> b = pybhl.BHLOpenURLRequest(genre='book',
aulast='smith', aufirst='john', date='1900',
spage='5', volume='4')
>>> r = b.get_response()
>>> len(r.data['citations'])3
>>> pprint.pprint(r.data['citations'][1]){u'ATitle': u'',
u'Authors': [u'Smith, John Donnell,'],
u'Date': u'1895',
u'EPage': u'',
u'Edition': u'',
u'Genre': u'Journal',
u'Isbn': u'',
u'Issn': u'',
u'ItemUrl': u'http://www.biodiversitylibrary.org/item/15284',
u'Language': u'Latin',
u'Lccn': u'',
u'Oclc': u'10330096',
u'Pages': u'',
u'PublicationFrequency': u'',
u'PublisherName': u'H.N. Patterson,',
u'PublisherPlace': u'Oquawkae [Ill.] :',
u'SPage': u'Page 5',
u'STitle': u'',
u'Subjects': [u'Central America', u'Guatemala', u'Plants', u''],
u'Title': u'Enumeratio plantarum Guatemalensium imprimis a H. de Tuerckheim collectarum /quas edidit John Donnell Smith.',
u'TitleUrl': u'http://www.biodiversitylibrary.org/bibliography/827',
u'Url': u'http://www.biodiversitylibrary.org/page/707932',
u'Volume': u'4'}
Let me know if you find it useful - I'd appreciate any feedback!
As of last Thursday, I have been inducted into the pantheon of published Python programmers (aye, abuse of alliteration is always acceptable). My article, "Using the OCLC WorldCat APIs," appears in the latest issue (June 2009) of Python Magazine. I'd like to thank my editor, Brandon Craig Rhodes, for helping me along in the process, not the least of which includes catching bugs that I'd overlooked. The article includes a brief history lesson about OCLC, WorldCat, and the WorldCat Affiliate APIs, a detailed introduction to worldcat, my Python module to interact with OCLC's APIs, and a brief introduction to SIMILE Exhibit, which helps generate the holdings mashup referencedearlier on my blog. Subscribers to Python Magazine have access to a copy of the code containing a functional OCLC Web Services key ("wskey") to explore the application.
It's good to see other people using your code. Thanks to the OCLC Devnet Blog, I found out that Etienne Posthumus used worldcat for a demo application he built during the WorldCat Mashathon in Amsterdam last week. Even more interesting is that Etienne's application was deployed on Google App Engine. Courtesy of OCLC's Alice Sneary, there is a brief video of Etienne presenting his application to the other Mashathon attendees:
Crossposted to NYPL Labs. Sorry for any duplication!
Hey, do you use Drupal on a site with several thousand nodes? Do you also use the Apache Solr Integration module? If you're like me, you've probably needed to reindex your site but couldn't be bothered to wait for those pesky cron runs to finish – in fact, that's what led me to file a feature request on the module to begin with.
Well, fret no more, because thanks to me and Greg Kallenberg, my illustrious fellow Applications Developer at NYPL DGTL, you can finally use Drupal's Batch API to reindex your site. The module is available as an attachment from that same issue node on drupal.org. Nota bene: this is a really rough module, with code swiped pretty shamelessly from the Example Use of the Batch API page on drupal.org. It works, though, and it works well enough as we tear stuff down and build it back up over and over again.
I've been busy the last few weeks, so I didn't even really announce this to begin with! I've been playing around with some of the cultural heritage APIs that are available, some of which I learned about while I was at Museums and the Web 2009. While I was away I released code for a Python module for interacting with the Brooklyn Museum Collections API. After chatting with Virginia Gow from DigitalNZ, I also got motivated to write a Python module to interact with the DigitalNZ API. The code for both is fairly unpolished, but I'm always ready for feedback! Both modules are available as Mercurial repositories linked from my Bitbucket account. There's also a small cluster of us working on a museum API wiki to begin sorting out some of these issues. Comparably speaking, the library and archives world has it somewhat easy...
The always groundbreaking Brooklyn Museum has now released an API to allow the public to interact with their collections data. I can't even tell you how happy I am about this in terms of an open data perspective. Also, this is the direction that makes the whole "detailed curation by passionate amateurs" thing possible.
There are only three simple methods for accessing the data. Ideally, it would be nice to see them put their collections metadata up as linked data, but now I'm daring to dream a little. Hey, wait a minute! I think that's the perfect way to start playing around with the API. Doing some digging through the documentation, I'm seeing that all the objects and creators seem to have URIs. Take a crack at it - the registration form is ready for you.
It's official - I've moved the codebase for worldcat, my Python module for working with the OCLC WorldCat APIs, to be hosted on Bitbucket, which uses the Mercurial distributed version control system. You can find the new codebase at http://bitbucket.org/anarchivist/worldcat/.
In my previous post, I included a screenshot of a prototype, but glossed over what it actually does. Given an OCLC record number and a ZIP code, it plots the locations of the nearest holdings of that item on a Google Map. Pulled off in Python (as all good mashups should be), along with SIMILE Exhibit, it uses the following modules:
If you want to try it out, head on over here. The curent of the code will soon be able as part of the examples directory in the distribution for worldcat, which can be found in my Subversion repository.
The moderated discussion hosted and sponsored by Nylink went pretty well. Also, I don't need the records to have fun with the data — I just need robust APIs. (In fact, as I said today, I'd prefer not to have to deal with the MARC records directly.) Robust APIs would help making prototypes like this one I hacked together in a few hours into a real, usable service.
Mashing Up WorldCat Holdings Data With Google Maps Using Python and Exhibit
Man, if this isn't a "you got your peanut butter in my chocolate thing" or what! As I wrote over on the NYPL Labs blog, we've been up to our necks in Drupal at MPOW, and I've found that one of the great advantages of using it is rapid prototyping without having to write a whole lot of code. Again, that's how I feel about Python, too, but you knew that already.
Once you've got a prototype built, how do you start piping stuff into it? In Drupal 6, a lot of the contrib modules to do this need work - most notably, I'm thinking about node_import, which as of yet still has no (official) CCK support for Drupal 6 and CCK 2. In addition, you could be stuck with having to write PHP code for the heavy lifting, but where's the joy in that?
Well, it so happens that the glue becomes the solvent in this slow, slow dance. Using Python becomes a breeze because of the batteries-included model it subscribes to. I've been playing around with the Services module and its XMLRPC server a bit, and given that xmlrpclib was added in Python 2.2, there's pretty much no excuse not to use it. Say what you will about RESTful interfaces, but out of the other options, Services' XMLRPC server is the most robust out of the others with the possible exception of AMFPHP.
Lately, I've been tinkering with it to figure out how to ingest metadata into Drupal that's stored either in other extremely complex databases or just as hunks of XML on a file system. I've been using lxml because of its XPath support, but given the fact that a lot of this XML data is remarkably dirty, I'll probably take some time to look at BeautifulSoup's BeautifulStoneSoup parser. However, this will take some work as some of this data will need explicit handling by that parser (nestable tags and the like).