This Is All I’m Going To Say On This Here Blogsite Concerning The Brouhaha About The Policy for Use and Transfer of WorldCat Records Because I Have Other, More Interesting And More Complex Problems To Solve (And So Do You)

The moderated discussion hosted and sponsored by Nylink went pretty well. Also, I don't need the records to have fun with the data — I just need robust APIs. (In fact, as I said today, I'd prefer not to have to deal with the MARC records directly.) Robust APIs would help making prototypes like this one I hacked together in a few hours into a real, usable service.

Mashing Up WorldCat Holdings Data With Google Maps Using Python and Exhibit

Mashing Up WorldCat Holdings Data With Google Maps Using Python and Exhibit

Bad MARC Rant #1: Leader Positions 06 and 08

I understand why the MARC leader position 08 is a good idea in theory. In fact, MARBI Proposal 97-07 suggests:

a change in definition to Leader/08 code "a" for clarification; making code "t" (Manuscript language materials) obsolete in Leader/06 and using code "a" instead; redefinitions of codes "a" and "p" in Leader/06; renaming the 008 for Books to "Textual (Nonserial); and deleting field 006 for Mixed material.

I can safely say that some pretty funky stuff gets cataloged with the leader position 08 set as "a," and much of it is incorrect, at $MPOW and otherwise. What is Leader/08 actually supposed to be used for? MARBI Proposal 97-07 again states:

Code a indicates that the material is described according to archival descriptive rules, which focus on the contextual relationships between items and on their provenance rather than on bibliographic detail. The specific set of rules for description may be found in 040 $e.  All forms of material can be controlled archivally.

Were that the case, why am I finding books cataloged using AACR2 with Leader/08 set to "a"? I'm also convinced that this extremely widespread by the information I can glean from the MARC Content Designation Utilization Project report, Format Content Designation Analysis: Data Report--General Profiles. Out of a data set containing more than 56,000,000 records, 0.3 percent have Leader/08 set to "a"; the breakdown by material type is even more interesting.

Leader/06 "t" was not made obsolete with arguably a good reason:

Leader/06 code t was not made obsolete and code a was not redefined because of concerns about how to identify material that is manuscript but not controlled archivally (i.e. not described according to archival rules, and thus not value "a" in Leader/08) if code t were to be made obsolete. LC should work with the archival and manuscript communities to bring back a proposal dealing with the remaining issues in the paper: making the three manuscript codes obsolete and finding a place to identify codex manuscripts.

This decision was made more than ten years ago and we still haven't made the manuscript codes obsolete. We also haven't found a method to identify codex manuscripts in MARC. In my mind, this in part boils down to the failure of AACR2 to define manuscripts properly:

4.0A. Scope

4.0A1. The rules in this chapter cover the description of manuscript (including typescript or printout) materials of all kinds, including manuscript books, dissertations, letters, speeches, etc., legal papers (including printed forms completed in manuscript), and collections of such manuscripts.

That's a pretty bleeping wide range if you ask me. One can obviously see that the proliferation of standards that followed were designed to fill this gaping hole that JSC left in AACR2. RDA only has one element that references manuscripts at all ("Production method for manuscripts"), so who knows what direction that'll go. Luckily, RBMS has been working hard on all of the DCRM standards!

When Life Hands You MARC, make pymarc

It's a bad pun, but what can you expect from someone who neglects his blogs as much as I do?

I've been busy, somewhat, and one of my latest forays has been getting a grip on Python, an absolutely wonderful programming language. I actually enjoy writing code again, which is more than a bit scary. I was sick of the mangled scripts and workflows I came up with at MPOW to handle converting MARC data to HTML and other such nonsense. Writing Perl made me feel unclean.

After playing around with Ed Summers' pymarc module, I began hacking about and putting my own hooks into the code here and there. I longed for MARC8 to Unicode conversion, which is a necessary evil. Digging around, I came across Aaron Lav's PyZ3950 module, which had its own little MARC code. After bugging Ed via #code4lib, and hassling Aaron in the process, Ed began incorporating the code and I started some testing. Just a short while later, the conversion code worked. I bugged Ed some more telling him about some changes I made, and he gave me the chance to contribute code directly.

I'm no expert, but I'm glad that I could help bring pymarc up to version 1.0 and that I've had a chance to begin enjoy programming again. I'm also glad to see that Catalogablog has spread the word. Download a copy and start hacking; maybe you'll be rewarded with rediscovering the joy of code like I was.

Is Open Data the Point?

I've been thinking about the biblioblogosphere's reaction to Casey Bisson's decision to use the $50,000 he was awarded by the Mellon Foundation for his work on WPopac to purchase LC bibliographic data and open it up to anyone who wanted to take a crack at it. Yes, this is a "Good Thing," and valuable to the library community as a whole, but I feel like there are some things we're overlooking. Dan Chudnov and I seem to agree, but I'm not going to go so far to damn those who herald this as a "new era." It's a little premature to say where it will go, but I have to admit that I'm occasionally confused and often a little bit insulted by some of the talk surrounding this issue.

I wonder how interesting all the bibliographic data of LC is to begin with. What's in the dump paid for by the Mellon Award money? I'd guess monographs and serials, and probably audiovisual materials. What about archival records? What would anyone do with those? Those won't be interesting to the small libraries that could benefit the most from this altruistic move, and in fact I believe that the biggest problem other than maintaining the records for changes will be separating the wheat from the chaff, which is ultimately an institutional (departmental, consortial, individual ...) decision. I'd love a dump of all the archival records, but I don't know what I'd do with them all; it's much easier for me to wade through them using their OPAC for the time being when I do institutional surveys.

Dan already emphasized that much of the discussion ignores existing collaborative workflows and that catalogers around the world are busting their humps to create this data. Listening to the Talis podcast with Tim Spalding of LibraryThing and Ross Singer (among others), I was a little surprised that Tim said "Librarians are very restricted in terms of what they can do with [bibliographic data]." I can really respect that people want to experiment with bibliographic data to improve access to information or just for the fun of hacking through it, but the assumption that librarians are very restricted in what they do with it seems a little misguided. I guess some are, but my gut reaction to this comment is something that I ended up leading a discussion about at Library Camp East last September: the divide between "techies" and "librarians," something that's often a false dichotomy yet often very real. I get this feeling often when I read threads on the NGC4LIB list, too. Maybe it's not quite the same divide, but I feel that those who want to play with our catalog data aren't talking to the catalogers. I realize some are expressing fear, uncertainty and doubt that anyone can mark up the sacred cow of an OPAC, but for every one of them I feel like there are several of us that are willing to play along and even help do the work. In nearly any situation, I'd be glad to provide a dump of a catalog to whomever wanted it as long as they gave me at least a vague idea of what they wanted to do with it.

RLG + OCLC = Clog Roc?

The technical services world has been in an uproar lately, between LC’s decision to stop creating series authority records (particularly since they didn’t consult PCC members beforehand) and the fallout after Calhoun report. We might as well have another drink, because as reports (along with several others), OCLC and RLG are about to merge. It’s mindblowing to think that RLG employees did not find out any sooner than the rest of us, and that either organization has yet to consult its members. However, RLG plans to do so, but it will be interesting to see how this pans out. In particular, some folks worried about the merging of data and the future of RedLightGreen. I know it’s not considerably better, but they seem to be overlooking Open WorldCat.