The David Livingstone Spectral Imaging Project, I Presume?

 


Yes, groan. But I spent this morning looking through the beautiful online collection produced by the David Livingstone Spectral Imaging Project. Livingstone’s 1871 field diary, from the months leading up to his ‘discovery’ by Morton Stanley, was written in a berry-based natural ink across the pages of newsprint, and has faded to near invisibility. Using spectral imaging (which images at distinct spectra and then recombines them), the team has managed to reveal the journal entries and strip out the original newsprint. The results are simply amazing — it reminds me of looking at Hubble images of distant nebula. Gorgeous, strange, new. In addition the the extensive documentation and supporting bibliographic and historical materials, the snazzy interface, which allows you to coordinate scrolling across the color and spectral facsimiles (as in the above image), is just stunning.

On the one hand, it’s a case of an extraordinary archival find (Adrian Wisnicki and Anne Martin’s recovery and reassembly of the often uncatalogued portions of the journal across several distinct accessions at the David Livingstone Center) combined with an ideal technology (the Archimedes Palimsest team brought their expertise to bear). But when you look at the extensive documentation provided, it’s also a window into the extraordinary challenge of producing collaborative, trans-Atlantic research in the digital humanities.

One of the most fascinating portions of the site is the extensive description and documentation of how the project developed. There is even a short lessons-learned page that summarizes key insights. First, despite substantial grants from the British Academy and a DH Startup grant from the NEH, they were unable to attain additional funding for the work during development. This meant, as they put it: “every funded project member providing significant work for the project out of scope. Unfunded team members, in turn, uniformly assisted on a pro bono basis.” From the standpoint of an academic, this doesn’t sound like a recruitment rather than operations problem. Ideally, some portion of internal research funding should go toward complex and valuable cooperative projects. But many were not, strictly speaking, academics — they were technicians and archivists with external commitments that aren’t organized to support this pro-bono work. This applies both to members of the official eight-person roster of official “team members,” and to non-members like volunteer archivist Anne Martin who made substantial contributions to the project.

Another striking challenge that the team ran into inhered in the complex file naming convention that they developed. After canvasing the metadata needs of various project members, they began with a scheme that seems intuitively to make a lot of sense: each image file name began with the institution and pressmark, followed by the Livingstone folio number, the institution’s own folio number, followed by the number of shots, the shot sequence, and the document type as suffix. There was also a project logging system that served to supplement by coordinating shot sequences with metadata and the data management team. But at several stages of the project, image production and processing had to be halted and delayed while file names were reworked and cleaned up in order to bring them back to code. The problem, as they put it in their write-up, was that the file name was “too complex”, and various preliminary imaging and processing attempts produced variations that had to be fixed at a later stage (presumably, without extensive, or at least, transparent documentation). One key insight here is that it is hard for file names to carry all of the relevant data.

Without more information, it’s hard to know exactly what might have helped fix this problem. But … one way the team might have approached this differently would have been to come up with a coordinated metadata entry strategy that routinized more of the process. There are systems designed to handle precisely such problems, but you’d want something stripped down and simplified so it didn’t prove too encumbering. For instance, insofar as they would ultimately be producing xml transcripts of the images anyway, they might have started with routines that produced xml metadata files to pair with each image object. A gui or web-based interface could require an entry for each field, coordinate the production of the standardized file name, provided an extra field for notes, incorporate this metadata plus notes into an appropriate XML header (perhaps based on TEI), and packaged the image file and xml metadata file in an archive using the standardized file naming convention. Simple scripts could handle batch extracting images and metadata from the archives for various users. This sounds like a lot of work, but considering the extensive delays in production, clearly some tighter coordination of data entry was needed. Inconsistency is the bugbear of metadata, and the problem grows exponentially with multiple users, in multiple institutions, on different continents. There’s even a note that Doug Emery, the team’s data manager, was often waking at 2:30 a.m. in Baltimore in order to sequence his workday with imaging going on back in the UK. This is, as they put it, “heroic,” but it’s also a little crazy.

On the one hand, the final insight is that initial, path-dependent decisions had a big impact on the technical and institutional hurdles that had to be taken. Perhaps the most important sentence in the writeup is the observation that “regularly scheduled teleconferences as well as shorter briefings focused on bridging disciplinary divides might have enhanced communication among team members during all stages of the project.” What this suggests is that long planning and development meetings were not quite as productive as hoped and probably discouraged planning too many of them. Constraining the length of meetings and focusing more of them on bridging expertise would have given the team more flexibility as the project developed. At the same time, given the extensive early planning that did go into every phase of the project, it’s clear that the challenge isn’t forethought but knowing in advance what they key challenges will be. And that’s a tall order for collaborations in a new research area. (Hence the need for more focused, interdisciplinary meetings.) At the same time, the final product has to be every bit as extraordinary as they could have imagined. Hard, unpaid work bridged that gap.