Privacy, Censorship, and Good Records Management: Brooklyn Public Library in the Crosshairs

Over at librarian.net, Jessamyn West has a brief write up about a post on the New York Times' City Room blog about placing access restrictions on offensive material (in this case, one of Hergé's early Tintin books at the Brooklyn Public Library). More interestingly, she notes, is that the Times was given access and accordingly republished challenges from BPL patrons and other community members. Quite astutely, Jessamyn recognizes that

the patrons’ addresses are removed but their names and City/State information are published. If your name is, for example, [name redacted by thesecretmirror.com], redacting your address doesn’t really protect your anonymity. I’m curious what the balance is between patron privacy and making municipal records available.
It's a good question that doesn't have an incredibly straightforward answer. My first concern was about whether BPL had kept the challenge correspondence beyond the mandated dates in the New York State records schedules. After doing some digging, on the New York State Archives' website, I came across Schedule MI-1 ("for use by miscellaneous local governments"), which seemed to be the best fit for the type of entity BPL is. Looking at the records schedule for libraries and library systems, I found the following:
Library material censorship and complaint records, including evaluations by staff, patrons' complaints and record of final decision: 6 years after last entry

NOTE: Appraise these records for historical significance prior to disposition. Some library censorship records deal with serious constitutional issues and may have value for future research.

For what it's worth, the scheduling for libraries and library systems that form part of other governmental bodies seems to be the same for this type of record. Accordingly, it seems that BPL was well in the scope of state policy to retain the earliest of the records they shared with the Times, which dated from 2005.

However, whether they should have shared them without redacting them is entirely another issue. Under "Public Access to Records" in the Introduction to Schedule MI-1, there can be restrictions placed on certain records based on § 87(2) of the Public Officers Law (POL), one of which being privacy concerns raised in POL § 89(2). In my mind, BPL didn't quite meet the requirements, of POL § 89(2), but IANAL; nonetheless, it was probably in poor taste and possibly ethically improper to not redact all personally identifying information from the records shared with the New York Times.

Everything is Bigger in Texas, Including My Talks on The Semantic Web

I'll be at the Society of American Archivists Annual Meeting next week in Austin, Texas. It looks to be a jam-packed week for me, with a full-day Standards Committee/TSDS meeting on Tuesday, followed by THATCamp Austin in the evening, an (expanded version of my) presentation on Linked Data and Archival Description during the EAD Roundtable on Wednesday, and Thursday's session (number 101): "Building, Managing, and Participating in Online Communities: Avoiding Culture Shock Online" (with Jeanne Kramer-Smyth, Deborah Wyth, and Camille Cloutier). And to think I haven't even considered which other sessions I'm going to! Anyhow, I hope to see you there, and please make either or both of my presentations if you can.

Must Contextual Description Be Bound To Records Description?

I've been struggling with the fact that (American) archival practice seems to bind contextual description (i.e., description of records creators) to records description. Much of these thoughts have been stirring in my head as a result of my class at Rare Book School. If we take a relatively hardline approach, e.g. the kind suggested by Chris Hurley ("contextual data should be developed independently of the perceived uses to which it will be put", 1, see also 2), it makes total sense to separate them entirely. In fact, it starts making me mad that the <bioghist> tag exists at all in EAD. Contextual description requires that it be written from a standpoint relative to that of the creator it describes. I guess what I keep getting hung up on is if there could be a relevant case that really merits this direct intellectual binding. I therefore appeal to you, humble readers, to provide me with your counsel. Do you think there are any such cases, and if so, why?

References

  1. Chris Hurley, “Ambient Functions - Abandoned Children to Zoos,” Archivaria 40 (Fall 1995): 21–39. Availalable from http://www.infotech.monash.edu.au/research/groups/rcrg/publications/provenance.html.
  2. Chris Hurley, “Problems with Provenance,” Archives and Manuscripts 23, no. 2 (November 1995): 234-259. Available from http://www.infotech.monash.edu.au/research/groups/rcrg/publications/ambientf.html.

Seeking Nominations for Co-Chair, RLG Programs Roundtable

Apologies for any duplication - we're just trying to get the word out!

As co-chairs of the RLG Programs Roundtable of the Society of American Archivists, we’re seeking nominees to co-chair of the Roundtable for 2009-2011. If you'd like to nominate yourself or someone else, please email Mark Matienzo, Co-Chair, at mark at matienzo.org. Please submit all nominations no later than 5 PM Eastern Time on Friday, August 7.

Serving in a leadership position for a Section or Roundtable is a great way to learn about SAA and its governance, contribute to new directions for the Society, and work with other archivists on interesting projects. It is also a great way to serve the Society!

Your RLG Roundtable Co-Chairs,

Thomas G. Knoles Marcus A. McCorison Librarian American Antiquarian Society

Mark Matienzo Applications Developer, Digital Experience Group The New York Public Library

The Archival, The Irreconcilable, and The Unwebbable: Three Horsemen and/or Stooges

This week in Charlottesville has been a whirlwind exploration of standards and implementation strategies thus far during my class, Designing Archival Description Systems, at Rare Book School. My classmates and I have been under the esteemed tutelage of Daniel Pitti, who has served as the technical architect for both EAD and EAC. Interestingly, there's been a whole lot of talk about linking data, linked data, and Linked Data, date normalization, and print versus online presentation, among other things. In addition, a few things have floated past on my radar screen this week that have seemed particularly pertinent to the class.

The first of these was a post by Stefano Mazzocchi of Metaweb, "On Data Reconciliation Strategies and Their Impact on the Web of Data". In Stefano's post, he wrote about the problem of a priori data reconciliation vs. a posteriori; in other words, whether you iron out the kinks, apply properties like owl:sameAs, etc., on the way in or on the way out. Via FriendFeed, I noticed Ed Summers' remark about "not [being] sure [he buys] the argument that linking-open-data community isn't doing a-priori reconciliation ... an argument could be made that this is why it is taking off." I'm inclined to agree with Ed - to a certain extent, it's a gracious gesture to do a priori reconcilation. The cool thing about Stefano's post, though, is that it came through to me via Ed posting the FriendFeed discussion to his Delicious as well as being shared as a printout provided to us via Daniel today in class.

Additionally, Joe Clark's post on A List Apart, "Unwebbable," was barking up a far different tree. In it, Clark makes the claim that certain kinds of documents are ill-suited to be come web pages. Specifically, he makes the following claim:

Some documents cannot be published using HTML. In many cases, we shouldn’t even bother trying. In other cases, we have to radically change the appearance and structure of the document. Ideally, we’ll start using custom XML document types—which, finally and at long last, might actually work.

In speaking of scripts and screenplays, he writes of their printed documental form:

Typography is lousy; old typewriter fonts of yesteryear were errantly mapped onto today’s spindly Courier type. But as an example of document engineering, scripts are brilliant. There’s an entire science involved in text indention. Text is rarely, if ever, “centered”; everything lines up at a tab stop, a concept that CSS expunges from the collective memory. ... With careful alignments like these, it’s easy to scan down a screenplay page. And now people want to transfer the format—intact—to the web. It’s not going to work. ... The quest to adapt scripts to the web recalls other “category errors,” to use Martin Amis’s phrase. Electronic commerce, we eventually figured out, does not take the form of “shopping malls” you “walk” through. “Magazines” and “catalogues” do not have discrete pages you flip (complete with sound effects) and dog-ear. “Web sites” do not look like magazine layouts, complete with multicolumn text and callouts.

All this drives the argument further that it's time to rethink our instances of archival record and context description in terms of how they are to be used online. "Finding aid" is a term that covers a number of documental forms which don't work well on the web. While EAD was purposefully designed to mark up these extant (or "legacy," depending on your view) descriptive apparatuses, it wasn't entirely designed to exploit the hypertextual form heralded by even the early form of the World Wide Web. Following Clark, Web-based presentations archival record and context description need not, and probably should not, look like the columnar container list bracketed by large spans of free text content. There are a couple of possibilities of what things could look like by using other people's examples:

But again, the question is what do we want this to look like to provide the best experience for the user. I still have yet to narrow down my suggestions with any certainty, but I stick with my opinion that the documentary form of description needs to change for the Web.

EDIT: Joe Clark also published the part of "Unwebbable" that was cut for the sake of brevity, namely referencing the following visual aids "categories of illustrations or graphics that would translate poorly to HTML semantics" (tx to Joe Clark for corrections in comments):

  • Org charts and flowcharts. Nested ordered lists are a proven failure here.
  • Circuit diagrams.
  • Many graph styles, including radial graphs.
  • Exploded and X-ray diagrams.
  • Blueprints.
  • Timelines.

“Summer Camp for Archivists” Sounds So Much Better

Crossposted to NYPL Labs.

I'm staying with colleagues and good friends during my week-long stint in Charlottesville, Virginia for Rare Book School. If you're here - particularly if you're in my class (Daniel Pitti's Designing Archival Description Systems) - let me know. I'm looking forward to a heady week dealing with descriptive standards, knowledge representation, and as always, doing my best to sell the archives world on Linked Data. Notes and thoughts will follow, as always, on thesecretmirror.com.

“Using the OCLC WorldCat APIs” now available in Python Magazine

As of last Thursday, I have been inducted into the pantheon of published Python programmers (aye, abuse of alliteration is always acceptable). My article, "Using the OCLC WorldCat APIs," appears in the latest issue (June 2009) of Python Magazine. I'd like to thank my editor, Brandon Craig Rhodes, for helping me along in the process, not the least of which includes catching bugs that I'd overlooked. The article includes a brief history lesson about OCLC, WorldCat, and the WorldCat Affiliate APIs, a detailed introduction to worldcat, my Python module to interact with OCLC's APIs, and a brief introduction to SIMILE Exhibit, which helps generate the holdings mashup referenced earlier on my blog. Subscribers to Python Magazine have access to a copy of the code containing a functional OCLC Web Services key ("wskey") to explore the application.

NYART Presentation: Archives & The Semantic Web

This last Tuesday, I spoke at the Annual Meeting of the Archivists' Roundtable of Metropolitan New York, where I gave a talk on archives and the Semantic Web. The presentation went over very well, and colleagues from both the archives field and the semantic technology field were in attendance. I did my best to keep the presentation not overtly technical and cover just enough to get archivists to think about how things could be in the future. I also have to give a big hat tip to Dan Chudnov, whose recent keynote at the Texas Conference on Digital Libraries helped me organize my thoughts. Enjoy the slides, and as always, I relish any feedback from the rest of you.

Drupal For Archivists: Documenting the Asian/Pacific American Community with Drupal

Hillel Arnold is the author of the second post in the Drupal for Archivists series on thesecretmirror.com. In addition to being a recent graduate of the Archives and Public History program at New York University and a relatively recent graduate from the Palmer School of Library & Information Science at Long Island University, Hillel also worked for me as an intern with the Digital Experience Group at the New York Public Library.  He was previously employed as an archivist at the Woody Guthrie Archives, the Digital Projects Manager at the Foundation for Landscape Studies, and currently works at NYU’s Tamiment Library/Robert F. Wagner Labor Archive where he coordinates EAD production and implementation of the Archivist’s Toolkit.

Over the course of the last academic year, I have been part of a team working on survey project aimed at identifying and describing archival collections relating to the Asian and Pacific American community in the New York City metropolitan area.  The results of the fifty-plus collections we surveyed have been posted on our Drupal-powered website, which has been an excellent fit for the needs of this project and has also enabled us to engage many of the challenges the project has presented.

By way of introduction, this survey project seeks to address the underrepresentation of East Coast Asian/Pacific Americans in historical scholarship and archival repositories by working with community-based organizations and individuals to survey their records and raise awareness within the community about the importance of documenting and preserving their histories. Funded by a Documentary Heritage Project grant from METRO: Metropolitan New York Library Council, the project is a collaborative effort between the Asian/Pacific/American Institute and the Tamiment Library/Robert F. Wagner Labor Archive at NYU.  Three graduate students – I-Ting Emily Chu, Nancy Ng Tam and I – were hired to do the survey work.

For close to nine months, we dug through the basements, closets and storage facilities of artists, activists, scholars and collectors. We visited the offices of arts organizations, theatre companies and social service agencies.  We looked at paper files, stage props, moving image materials, digital photographs and emails. Despite the diversity of institutions, people and materials we encountered, a common theme began to emerge.

Due to the nature of many of the organizations we worked with and the cost of space in New York City, many of the collections were spread across several different locations, including private apartments and other publically inaccessible spaces.  This problem is even more acute on a larger level; there is no significant archival repository in the NYC area dedicated to collecting documentation of the Asian/Pacific American community.

The website initially began primarily as a way to publicize the project and fulfill the grant requirements.  However, as we began thinking about the site's structure, content and audience, we realized that we had the potential to do something far more interesting; to build a research center for scholars and members of the Asian/Pacific American community, and to bring together these physically dispersed collections via standardized descriptions.  I was introduced, through Mark’s timely prodding, to the wonders of Drupal at DrupalCamp NYC and quickly realized that this project would be a perfect application for Drupal, since we were dealing largely with structured data and wanted the ability to present that data in a variety of ways.

With Mark’s good advice and the assistance of Brian Hoffman of NYU’s Digital Library Technology Services, I was able to get a site up and running in a few weeks.  The majority of the site’s content is based on three content types, built with the Content Construction Kit module. The main content type, Archival Resource, contains collection-level information including dates, extent, language, arrangement, an abstract and a scope and content note. The Archival Resource content type also links to an Entity content type via a node reference field. This Entity content type describes the person or corporate body responsible for creating the records, including dates of existence, authorized form of name, and a historical/biographical note. A Location content type, with repository-specific information such as address, hours and contact person, is also tied to the Archival Resource content type via a node reference field.

Taken together, the three content types amount to the front matter for a finding aid. Separating the content into three different types avoided repeated entry of the same data, which in turn prevented wasted effort and data inconsistency. Drupal also allows for field-level data validation and formatting, which dramatically reduces the chances of human error in data entry, which was especially important as there were a number of people responsible for creating content. The display of the of the content is controlled through the Views module, which gives us the ability to programatically create displays from a collection list with brief abstracts to a complete view of the survey data, all with the same data.

We also created four very simple taxonomies - ethnic context, geographic coverage, organization type and person type - and applied these to the collection level description in the Archival Resource content type. These taxonomies allow users to browse through the collections via facets, a critical function on a site that aims to expose hidden collections.

In terms of this project, the real strengths of Drupal have been its ability to handle structured data in flexible and powerful ways via customizable content types.  Having developed a number of static HTML sites in the hazy past, I’ve also been grateful for the way Drupal separates the development of infrastructure and function from the generation of content.  This has allowed others a significant hand in creating the site’s content, and has prevented me from having the dubious responsibility of being the only person who can update the site.

The site is still very new, and we’re looking for ways to publicize it, generate more content, and create a user community.  The survey project will continue for another year, thanks to a funding extension, and additional descriptions will be added during this time.

Why Are Dated Videos From Telecom Research Groups So Goofy?

Thanks to  Boing Boing, I just discovered this video from the GPO (BT's predecessor in the UK) about the future of telecommunications, including video conferencing and telework.

My immediate thought is that it reminded me of "Erlang: The Movie", made by Ericsson.

It also reminded me of this slightly horrendous promo video for post-Lucent Bell Labs.

Oh, and while we're at it, we might as look at a pre-Lucent Bell Labs promo video, too.