Machine Grading

A friend of mine drew my attention to the NYTimes’ recent article on advanced in essay-grading software. It’s technology that will raise hackles at campuses around the country. The claim is that such programs are becoming sophisticated enough to grade college-level writing. Of course, their effectiveness is widely debated. The article helpfully includes a link to a study by Les Perelman which critiques the data being used to support such claims (he argues that sample size problems, confusion between distinct kinds of essays and grading systems, and loose assertions undermine the argument). The software is getting better, but it still doesn’t look like it can quite replicate the scores produced by human graders.

But such criticism is an argument at the margins. There is now clearly room for debate on both sides. Machines are comparable on standardized tests. The long-term trajectory is evident: if machines are roughly as effective as a force of part-time human graders, standardized tests will end up using the software to save money. They’ll keep some humans in the loop cross checking and validating, but the key incentives all point in the direction of greater automization. The reductive structures and simplistic arguments which we train students to replicate for these tests has laid the groundwork. We’ve already whittled essay writing into an algorithm.
Continue reading

Research Project

So, to paint the scene for my next series of posts, my current research involves using semantic indexing, combined with syntactic models, to look for analogies in nineteenth-century works. I’ll explain why at a later point — for now, I’d just like to lay out the software I’ve been using, and where I’m taking the project.

My current research, which I presented at ACA 2009 (a computational algebra conference) uses two main suites of tools. For the semantic indexing, I used the tools made available at CU-Boulder’s LSA lab. Semantic indexing proceeds by tokenizing a large database words, and getting the term-document frequency counts (counting how many times all of the words occur in each document). Then, using a technique called partial singular value decomposition, this matrix is reduced to a smaller matrix that effectively sifts through the co-occurrence statistics to try and sort out which relationships between the terms are most informative about the structure of the data set. Once you’ve got this index, you can come up with a rough representation of the meaning of a term or sentence by adding together the singular value vectors for each term. And you can describe differences in meaning in terms of the cosine of the angle between those vectors. The technique has proven very effective at, for instance, naive selection of synonyms.

The other tool I used was a part of speech tagger called Morphadorner, developed by the MONK project group. Morphadorner is just a small part of the software suite underpinning MONK, which includes a relational database and some built-in analysis tools derived from MEANDRE/SEASR. I like Morphadorner because it’s both trainable, and comes with a preset for tagging nineteenth-century fiction, which is largely what I’m interested in.

In the short term, I used these tools to do an analysis of the distribution of analogies in the 1859 text of On the Origin of Species, in order to investigate whether this approach could add support to some speculations about the role that analogy plays in that work and in scientific writing generally.

But there are several weaker aspects of this work. First, the semantic indexing tools at the CU Boulder site are limited, particularly by the training corpuses used for their singular value tables. I focused upon a general knowledge training set that covers several years of modern undergraduate course readings, because this seemed to include a better mix of both general and specialist knowledge for looking at scientific works. But it’s clearly problematic to use this library for analyzing nineteenth-century science, with its particular idioms, vocabulary, and habits of expression. What I need to do is create my own corpus of nineteenth-century works, preferably including a broad swathe of fictional, periodical, and scientific texts. Additionally, it would be nice if I could slice up that corpus in various ways, in order to examine the differences between, say, fictional and scientific corpuses, or earlier and later.

In addition, I need to do some additional training/verification of Morphadorner to make sure it’s tagging nineteenth-century scientific works properly, as well as the fiction. Hence the current project.