Some weeks ago, Guillaume wrote a nice post about the challenges and puzzles that grading can bring up. Amongst his many good points, he noted that a grade is a fairly reductive, one-dimensional assessment of the many dimensions along which the quality of an assignment can vary. I absolutely agree, and it’s made me re-think how I grade a little.
Thus, when a few hundred pages of papers land on my desk tomorrow (an early Christmas present, clearly), they’ll be graded along a 10-dimensional grading scheme which scores essays on things like thesis statement, use of scholarly sources, structure, and a number of other criteria. It’s the first time I’ll use such an explicitly multi-dimensional scheme, and I indirectly owe Guillaume the idea! (although I owe the grading scheme itself to Julie M. Norman, currently visiting faculty in McGill’s Political Science Department). In any case, it’s QED: reading this Grad Life Blog is useful for graduate life (and potentially good for your students, too!).
Unfortunately, however, this blog doesn’t quite resolve all worldly issues, including the common complaint (by students and TAs alike) that grading papers is soooo arbitrary. Now, in what sense is grading arbitrary? Certainly, as any teaching assistant knows from experience, no two papers are graded in exactly the same conditions: weather, health, mood, and a host of other technically irrelevant factors can affect the way one grades, and although concentration and commitment go some way towards suppressing these undue influences, they are hard to eliminate entirely.
Now, does that mean that most people get a grade somewhat higher or somewhat lower than they deserve? Absolutely. But is that unfair? Not necessarily.
When a TA grades an essay, she might give it a 72, and usually that’s that – the essay gets 72. But what if the TA now somehow completely forgot about the essay, and it was put back in the pile of papers left to mark? The TA would pick it up again the next day, and, after a long day locked up in the office, she might give it a mere 69. Repeat the experiment, and, picking it up again after an energising gym session, the TA might give it a nice 75. The range of outcomes – from a lowly B-, to a sound B, to a nice B+ – makes a huge difference to the student, and yet the grade hinges upon entirely arbitrary factors! So far, then, things look like they’re pretty arbitrary.
What’s the “true grade” the essay should get? One could define it as the grade that would be given most often if the above experiment was repeated a million times (ignoring for the moment differences in grading between TAs). If we plot the grade given for each instance of the experiment on a graph, we would probably obtain something looking not unlike a normal distribution (see Graph 1 above): when graded, the paper’s “expected / true grade” is, say, 72; and yet, more often than not, the “actual grade” given will deviate from this ideal point by a bit. And inevitably, big deviations will happen, too: if the distribution is actually ‘normal’, 5% of papers will get a number grade that’s so far off that it actually affects the student’s letter grade – that’s one in 20 students, and even that’s probably being too optimistic about TA’s grading reliability.
That all sounds pretty depressing, but it’s actually not all that bad. If the world is really as depicted above, then ‘actual grades’ closely track ‘true grades’: the better the paper, the higher the probability that it will get a good grade. As a consequence, it pays to write better papers, despite the inherent arbitrariness of grading: as Graph 2 shows, although in rare situations, the ‘worse’ paper B will get a better grade than the ‘better’ paper A (70 vs. 69), paper A nevertheless remains hugely more likely, overall, to earn a better grade.
Somewhat oddly, then, the system thus both does not and does track ‘true’ paper quality: a TA can’t guarantee that a paper gets what it really deserves, but a TA can guarantee that writing a good paper increases students’ chances of getting a good grade. That might sound stupidly obvious – like saying “it’s good to do good things” – but if it holds up, it actually disqualifies a lot of the arbitrariness talk usually associated with grading. Grading is arbitrary, but only in the specific sense in which ‘actual’ grades are (normally?) distributed around ‘true’ grades, which does not invalidate the system’s fairness, or its overall accuracy (although it does make for a number of happy and bitter surprises every time papers are handed back). Encouragingly, it also means that the grades students get on any given assignment are largely of their own making, plus or minus a small margin of error. A good TA will keep that margin as small as possible – and here multidimensional grading schemes might be of some help – but eliminating it altogether is probably not possible.
Ultimately, in his post, Guillaume noted that “For better or for worse we have to work in this system and it’s not clear to me if there even is a demonstrably better system (let me know if you think there is!).” What do you think?*
*The next best alternative to this system is, presumably, good old “staircase grading”: (a) take the papers to be graded. (b) throw them down the stairs. (c) papers landing on higher steps get higher grades. It is pretty efficient, although it doesn’t quite resolve the issue of having to write specific feedback on each essay.
**Just in case, I should perhaps note that I don’t exactly consider any of the above to be a pathbreaking contribution to our understanding of grading – to the contrary, most of it is probably self-evident, if not truistic. But seeing my inability the other day to explain what I meant by “grading both is and isn’t arbitrary” to some friends, I wanted to put it down on paper. It also illustrates the odd things the mind of a graduate student can spend time pondering, and it has the added benefit that the seeming seriousness of the above balances out some of the silliness of my previous posts.