Talking TED (“Understanding Analogy: Theory and Method”)

A few months ago the Information Sciences Institute here at USC invited me to talk at one of their weekly Natural Language Seminars. They knew I’d been working on theorizing and analyzing analogies digitally, and wanted to hear more.

It was an exciting but daunting opportunity. How would I speak to an audience that thought about language and procedures for studying it in a radically different way? Several years ago I gave a talk like this at a conference for the Association of Computational Algebra. It didn’t go over well.

This time, I decided to experiment with a TED-style talk. There’s been a lot of criticism of the TED format. Most of it centers on whether the talks are accurate and informational or simply entertainment. Some do seem to be the intellectual equivalent of cotton candy — tasty but evanescent. But they also, I think, are a model for how to talk to a wider audience and enlist interest across cultural, institutional, and disciplinary boundaries.

So I studied up. There’s Nancy Duarte’s TED talk on TED talks, and Chris Anderson, a TED coach, has also shared his recipe. I think it boils down to three things. First, use biography (yours or another’s) to tell a coherent story that centers on the problem you work on. Second, have a clear transition from the problem to your answer. And finally, emphasize why that answer is powerful — what it changes about how we see the problem, and what it might mean for others. To put it differently, they rely on an analogy drawn between a personal narrative and a larger problem.

Put this way, it’s a recipe that applies to most of the good talks that I’ve seen, except TED talks are more personal and less complex. You have to put yourself forward and abandon qualifications, hedges, and the basic acknowledgement that others have been working on similar problems, often more successfully.

Despite discomfort with the TED format, I’ve been trying to figure out how I can get my scholarship out to a wider audience, especially communities beyond academia. This seemed like a great opportunity to experiment.

So I sat down and hammered it out. Meg was out of town, which meant that most of the writing happened with my daughter in my lap, and we practiced with her in the baby bjorn (she’s my biggest fan).

The final title: “Understanding Analogy: Theory and Method.” The folks at ISI posted it here. It doesn’t quite live up to the billing, but it worked. My auditors generally agreed that analogies are an important feature of new ideas and that I’d found a new way of looking at them. And since that talk we’ve been talking about collaborating on a machine learning tool that finds analogies. I’m recruiting undergrads for some initial work this summer. It will be exciting to see where this leads.

Surfing the Permanent Revolution: Digital Humanism at NAVSA 2013

This week I’m back from NAVSA. Well — not really back; it was just up the road in Pasadena. But I expect to spend some time nursing this (intellectual) hangover and thinking of the talks that I saw and the questions that were raised there.

Most immediately, it’s clear that digital work has hit the pavement in 19th century studies. Natalie Houston gave a fantastic talk about her “Visual Page” project, which uses Google’s tesseract OCR reader to analyze formal elements in a print corpus of Victorian poetry. It was stunning how much a computer can learn about a poetry collection just from the blank spaces on the page. Maeve Adams gave an intriguing paper that read across key terms in Victorian periodicals as “epistemic communities” and used this to ground a far-reaching argument about formalism in the 19th-century. And Rachel Buurma expanded on her work on Charles Reade and his archives — an eccentric even among archive rats. As she put it, his wildly profuse collections of documents, indexes, and indexes on indexes, add up to archives “on the way to becoming novels.” I’m almost convinced to read more Reade. It doesn’t sound like he would have appreciated YAHOO (I read the marginalia as: “In other words know the contents before you know anything about this”):

On Saturday I participated in a digital roundtable that Anne Helmreich of the Ghetty Foundation organized to field questions about research and pedagogy from conference attendees. The Prezi from my own talk, about some of the tools I’m using in class, (using Facebook as a social CMS and Google Drive for workshops) is posted here. My main point was that English seminars have always been “flipped”: focused on in-class workshopping and intellectual tinkering. Which makes it easy to fold in digital tools. (I take my inspiration here from Jentery Sayers and his Maker Lab.) But I was more interested in hearing what the other panelists and the attendees had to stay about the state of the digital union with C19 studies.

Most questions raised by the participants were about the ins and outs of digital scholarship: how to recruit technical collaborators (find research questions they’re interested in); how to find time and money for the work (no good answer there); how to use statistics (to be avoided while best standards are worked out); how to use undergraduate research more effectively (give them work that is tied to your own research + break projects into discrete chunks). This last point was made by Dermot Ryan, current Undergraduate Research Director at Loyola Marymount. I suspect the dismal statistics for undergraduate research conducted in the humanities at LMU would be matched at USC. It’s a thorny problem. I’ve been thinking about ways to pull undergrads in to my next digital research project. But as I focus on finishing my analogue book, there’s not much I can think of sending undergraduates out for, besides checking references. Clearly this is a problem with hermetic patterns of research. In order to frame more collaborative projects we have to hash research questions into practices that depend less on our own idiosyncratic habits of mind and the idiolects of convenience. We (or at least, I) need to be better at looping others in.

It was also a huge pleasure to meet Petra Dierkes-Thrun and learn more about the “Wilde Decadents” class she’s running at Stanford and its blog. The class generated tremendous interest; the work the students produced was read by visitors from across the globe. I’m frankly envious. She was particularly savvy promoting the course and its Twitter account through academic networks and listserves like the Victoria List.

But perhaps the most intriguing contribution to the roundtable, to my mind, was Andrew Stauffer’s diagnosis of the NINES project. NINES is currently working to redefine itself to better serve the current wave of digital scholarship. As Andrew described it, NINES was originally envisioned as a coordinator and peer-review network for online collections produced by academics — sites like the Rossetti Archive, the Woman Writer’s Project, and Darwin Online. They envisioned an academic internet populated by public research archives. Instead the major commercial publishers and Google have digitized masses of texts and placed them behind paywalls. Gale’s NCCO database is a case in point. A corollary challenge is that NINES’ COLLEX originally provided a solution to the basic problem of finding a CMS to furnish different kinds of academic content. But the widespread adoption of other open source CMSs like OMEKA diminishes the case for further investment in COLLEX. The folks at NINES are now trying to figure out how else they might support digital research — for instance, producing new tools for digital analysis along the lines of DocuScope. I’m looking forward to their public launch of Juxta, which produces a visual codex for textual variants. There’s an undergraduate who’s been asking for a good tool to start DH work with and this looks friendly enough to be promising. Andrew also suggested NINES might start convening seminars which bring humanists and engineers together to test new research avenues. It would be exciting to have an interdisciplinary research seminar that was formatively tied to a technical team rather than an academic department — tied to makers as well as thinkers.

At its heart, NINES is a classic disruption story. It announced a new chapter in 19th-c scholarship when it was launched in 2003 — the same year as NAVSA’s first conference. Both organizations are now at a crossroads (Dino Felluga handed his role as NAVSA’s head and to Marlene Tromp on Saturday). Given the rapid change of our technical tools, no organization or project that locates itself in the digital sphere will we able to avoid a regular reinvention. I spent a considerable amount of time getting the Monk project software up and running for an early experiment with my Darwin analysis. I invested even more time figuring out the Meandre suite, including a trip up to the UVic digital workshop, with hours both in person and via email, drawing on the expertise of Loretta Auvil and Boris Capitanu. That culminated in a single talk at the Seattle MLA on the global network imagined by Oliphant’s novels. The return on investment for this work has been relatively small. And now both Meandre and Monk have exhausted their funding and have begun to recede into history. I’ve just now noticed that “monk-project” is embedded in the permalink for this post — legacy of an early vision for this site.

Like any story, it has been a combination of design and contingency. I’ve been focused on cementing a traditional research profile, using the digital work to keep my hand in, waiting to mount my extended DH project when the book’s off. Each effort has given impulse to that trajectory. It’s still exciting to imagine the tools and methodologies that the next two and ten years will bring. And yet, as I listened to conference attendees ask what it would take to get trained in digital work, how to figure out the appropriate criteria for significance, how to adapt to new technologies — essentially, how to surf a continual revolution — it hit me what DH work signs you up for. A lifetime of fresh tarball installations, cribbed command prompts, endless help pages for new object libraries and bewildering new GUIs. As the tools change we reboot and relearn. We need to be honest about this. Off the top of my head, my current experiments with Python follow upon, in reverse order, exploring Ruby, Java, JavaScript, JQuery and MySQL, XSLT, CSS, VisualBasic, HTML, and TCL (!). This sets aside humdrum life as a sysadmin for OSX, Linux, AmazonAMI, WinXP, MsDos and Unix machines — not to mention WordPress itself. The most rigorous Ph. D. programs require two to three languages, not four or five.

If it sounds like I’m grousing, maybe I am. We need to emphasize the long dead ends as well as the triumphs of DH scholarship when we talk to curious peers. But the big “but” is that, as academics in the humanities, we’re tinkerers by trade — whether on our computers, in the classroom, or at the archive. For my part, I’d be exploring some version of these technologies in any likely case. It’s just so much time wasted stringing zeroes and ones unless I invest this labor in my research. Besides, I want to show my daughter what hacking looks like.

Mapping the World of Oliphant’s Novels

About a year ago, at the previous MLA, I gave a talk on a panel that detailed literary reactions to the Scottish Rising of 1745. I’d thought I’d written about it, but in the process of getting this server back up and running, I found this old draft post. As part of that panel, I gave a talk on Victorian reactions to the ’45, focusing on the novels of Margaret Oliphant and Robert Louis Stevenson. Part of the question I wanted to raise was whether the rising is typically understood at a site of political and historical closure that cements the constitution of “Britain” as a cultural entity. One way to get at this, I thought, was to see whether literature written about the rising emphasized Britain over Scotland and England.

But it was also a good opportunity to experiment with using network analysis and mapping to explore the geographic imagination of the nineteenth-century novel. One way to raise the question of political formation i to look at the locations that are explicitly cited in each novel, and to map out how they are connected. To do this I extracted location entities from several of Stevenson’s novels using the Meandre framework (apparently now defunct), as well as 65 of Oliphant’s, derived from the Internet Archive, and I did a series of network analysis graphs using Gephi to look at which locations are cited most frequently, and which other locations they tend to be cited with. An example of a plot this produced is below, and shows locations referenced in Oliphant’s novels, sized by reference frequency and connected by proximity of references.

Vector graph of locations in Oliphant’s Novels, Sized by Degree

I found it hard to figure out how to visualize the networks effectively in a talk. This was less of a problem for the Gephi visualizations, which are static, though images with a large number of nodes presented a challenge. One strategy I experimented with was to do a screen capture movie and then edit the movie so that I could produce a video that zoomed in as I spoke. In retrospect, it would have been more effective and flexible to use Prezi.

One question that working on the talk raised was how to evaluate the utility of these visualizations in the context of a talk. In the case of Oliphant, the justification accrues in the difficulty of assessing the range and depth of her fiction. An essayist and author of more than ninety novels, works which were often simultaneously serialized across several publications, it is almost impossible to wrap your head around her production. On the other hand, it’s a gives you the chance to make some nice visualizations. Here’s two animations I made using GEPHI and Google Maps. The first is a network map of locations in her novels with node size and proximity scaled to total number of links, the second is a world map with the locations geocoded.

 

 

Hacking: WYSIWYG

Two weeks ago I noted that someone had recently tried to get into my WordPress server. My firewall traced the query back to an IP in China, though I don’t have the ability to figure out where it originated from initially. I linked it to news of escalating activity from abroad; it seems that attempts to get into academic networks are sharply on the rise.

Then a week ago my server collapsed under what seemed to be a DDOS attack. I tried to restart it several times, but everytime I got the server back up it was swamped with traffic. I’ve spent a good eight hours now launching a new server and migrating over content from a backup. Most of my posts are back, but I lost the last year’s worth of images. I’ve only been able to recreate or restore about half.

It’s all kind of creepy. And it may be beyond my capacity to try and stay on top of escalating security problems on a private blog. Apparently there’s a botnet that’s been hacking WordPress servers generally for the last several months. I like having my own site; I like the ability to post whatever content I want and try out different kinds of server technologies; my Omeka-based class last year depended on this capacity. But the bar is getting higher.

Machine Grading

A friend of mine drew my attention to the NYTimes’ recent article on advanced in essay-grading software. It’s technology that will raise hackles at campuses around the country. The claim is that such programs are becoming sophisticated enough to grade college-level writing. Of course, their effectiveness is widely debated. The article helpfully includes a link to a study by Les Perelman which critiques the data being used to support such claims (he argues that sample size problems, confusion between distinct kinds of essays and grading systems, and loose assertions undermine the argument). The software is getting better, but it still doesn’t look like it can quite replicate the scores produced by human graders.

But such criticism is an argument at the margins. There is now clearly room for debate on both sides. Machines are comparable on standardized tests. The long-term trajectory is evident: if machines are roughly as effective as a force of part-time human graders, standardized tests will end up using the software to save money. They’ll keep some humans in the loop cross checking and validating, but the key incentives all point in the direction of greater automization. The reductive structures and simplistic arguments which we train students to replicate for these tests has laid the groundwork. We’ve already whittled essay writing into an algorithm.
Continue reading

Peries Project Archive

The annual NSSE benchmark study of universities is out and it has a handy “Report Builder” that allows you to generate reports drawn from their broad survey of freshman and senior undergraduates at a huge range of institutions in the US and Canada. I decided to play around with it a bit, and generated these two models of student opinions about their major at competitive research universities in the US:

Freshman Responses by Major

Freshman Responses by Major
Freshman responses by Major

 

Senior Responses by Major

Senior Responses by Major
Senior Responses by Major

 

 

This seems to confirm the counterintuitive reaction I get when I tell people outside the university that I’m an English professor. 9 times out of 10, they tell me how *hard* they found their English courses in college. Continue reading

The David Livingstone Spectral Imaging Project, I Presume?


Yes, groan. But I spent this morning looking through the beautiful online collection produced by the David Livingstone Spectral Imaging Project. Livingstone’s 1871 field diary, from the months leading up to his ‘discovery’ by Morton Stanley, was written in a berry-based natural ink across the pages of newsprint, and has faded to near invisibility. Using spectral imaging (which images at distinct spectra and then recombines them), the team has managed to reveal the journal entries and strip out the original newsprint. The results are simply amazing — it reminds me of looking at Hubble images of distant nebula. Gorgeous, strange, new. In addition the the extensive documentation and supporting bibliographic and historical materials, the snazzy interface, which allows you to coordinate scrolling across the color and spectral facsimiles (as in the above image), is just stunning.

On the one hand, it’s a case of an extraordinary archival find (Adrian Wisnicki and Anne Martin’s recovery and reassembly of the often uncatalogued portions of the journal across several distinct accessions at the David Livingstone Center) combined with an ideal technology (the Archimedes Palimsest team brought their expertise to bear). But when you look at the extensive documentation provided, it’s also a window into the extraordinary challenge of producing collaborative, trans-Atlantic research in the digital humanities.
Continue reading

ThatCampPenn 2012

I spent Wednesday on campus at Penn’s inaugural ThatCamp. It was set up by the Penn Library and the Penn Humanities Forum, and showed the promise and possibility of the “unconference” format, particularly when applied to something as tentative and collaborative as the digital humanities.

Amanda French, who came up from house THATCamp at George Mason and the Center for History and New Media. She set precisely the right open, collaborative, free-wheeling tone at the opening session, and it carried through. The thing that struck me most forcefully is that the open formatting creates environments that are extraordinarily friendly to non-specialists. Continue reading

Gephi Network Visualization of Humphry Clinker

I’m still working on slides for my talk at the MLA on Stevenson and Oliphant, and Victorian reflections on the ’45 (force-directed network and Google map visualizations here and here). I’m also starting to experiment with Gephi, a powerful open source graph editor. I was blown away by Matthew Jocker’s “Nineteenth-Century Literary Genome” animation, and wanted to know how it was made. Apparently, they produced it one frame at a time as separate png files and then assembled them using Quicktime.

I’m still trying to figure out how to produce animations, but I like working in Gephi. It has a feature-rich interface and allows you to edit and remove nodes, perform clustering and various forms of network analysis easily and produces sharp images. Here is the location entity network from Humphry Clinker (1771), arranged into eight clusters, with nodes and edges colored by group:

Gephi makes beautiful static images, and as can be seen in genome video, beautiful animations. On the other hand, unlike the Protovis graphs, finished visualizations are not dynamic or interactive. You can’t output a script-based visualization that the user can play with, or that could be embedded in a presentation. Not a problem for a presentation, really, but I like the activity that a Protovis graph can bring to web publishing.

I’m also evaluating these various visualization approaches in order to prepare for my historical fiction and fantasy seminar next semester, which will ask the students to help produce an online textual exhibit using Omeka. I’m going to ask them to look at what’s possible and then pitch paratextual visualizations & tools to package with the exhibit.

“Webby” Publishing and Scholarly Digital Literacy

HASTAC 2011 has posted videos of some of their panels, and I was taken with two points, brought up by Dan Cohen and Tara McPherson as part of the panel “The Future of Digital Publishing,” which can be viewed here.

First, I was taken with Cohen’s final suggestion, that humanities scholars are “terrible economists” because our pursuit of print perfection causes an inordinate investment in the final stages of publication (proof-reading, reformatting notes into periodical-specific styles). As he notes, we have learned to look past such fastidiousness in some web formats, and this indicates that, in a “webby” mode, we are able to relax those standards and still take work seriously.

McPherson’s wide-ranging discussion canvased the new formats and possibilities which digital archives are opening up to us, and she asked Humanities scholars not to cede the task of figuring out how to manage massive data sets to scientific and computer scientific communities. I was particularly caught by the description she gave of a question that keeps her up at night: ten or fifteen years for now, how will young scholars make sense of the wild explosion of publication formats and approaches which archives and DH work have opened up?

This, for me, raises a third question and perhaps more challenging problem: how will we cultivate scholarly digital literacy? Part of what reinforces the power and importance of print and text-based publication is the high-level textual literacy that humanists develop. I think about how hard it was to develop the specialized literacy it took for me to understand scholarly publishing formats — this demanded a huge evolution in my reading practices, above and beyond what I would describe as my already high-level textual literacy as an undergraduate. When we present something like a simple chart, or even an object as complex as an active network visualization, much less expose users to archives of new material, we tacitly demand some literacy in those formats. Brief textual descriptions and introductions don’t suffice here. Cohen’s observation regarding the relaxed constraints offered by “webby” publishing standards emphasizes the point: we tolerate spelling errors because the effort otherwise put into exhaustive spell-checking is being invested elsewhere, in the aspects of digital scholarship that entail considerable investments in both acculturation and ongoing labor.

I think there is an expectation that great content and great scholarship will cultivate literacy. The iPad certainly shows (as McPherson notes), that a transformative product can drive technical literacy in a way that seems immediate and unreflective — a transformation so profound that it produces what Thomas Kuhn describes as the “gestalt” experience of a new episteme — and the duck becomes a rabbit. And yet, I worry that new technologies and techniques, especially as they are initially developed, pull in precisely the opposite direction. Certainly, this concern weighed on me as I decided which path to pursue in my own work, and I’ve opted primarily for publication in traditional print formats, and the forms of scholarship that would help me achieve that aim. If you watch to the close of the talk, the audience questions are dominated by the problem of tenure and scholarship-evaluation standards. But this is generally cast in terms of accommodation or, alternatively, forcing traditional scholars to change their practices, rather than acknowledging that digital scholarship demands, in effect, a new, and truly complex, set of literacies.