A friend of mine drew my attention to the NYTimes’ recent article on advanced in essay-grading software. It’s technology that will raise hackles at campuses around the country. The claim is that such programs are becoming sophisticated enough to grade college-level writing. Of course, their effectiveness is widely debated. The article helpfully includes a link to a study by Les Perelman which critiques the data being used to support such claims (he argues that sample size problems, confusion between distinct kinds of essays and grading systems, and loose assertions undermine the argument). The software is getting better, but it still doesn’t look like it can quite replicate the scores produced by human graders.
But such criticism is an argument at the margins. There is now clearly room for debate on both sides. Machines are comparable on standardized tests. The long-term trajectory is evident: if machines are roughly as effective as a force of part-time human graders, standardized tests will end up using the software to save money. They’ll keep some humans in the loop cross checking and validating, but the key incentives all point in the direction of greater automization. The reductive structures and simplistic arguments which we train students to replicate for these tests has laid the groundwork. We’ve already whittled essay writing into an algorithm.
Continue reading