Evaluating Faculty Quality, Randomly

Here’s an interesting piece about teaching quality and evaluation. The most interesting point from my own perspective is this:

For math and science courses, students taking courses from professors with a higher “academic rank, teaching experience, and terminal degree status” tended to perform worse in the “contemporaneous” course but better in the “follow-on” courses, according to the report. This is consistent, the report asserts, with recent findings that students taught by “less academically qualified instructors” may become interested in pursuing further study in particular academic areas because they earn good grades in the initial courses, but then go on to perform poorly in later courses that depend on the knowledge gained from the initial courses.

In humanities, the report found no such link.

Carrell had a few possible explanations for why no such link existed in humanities courses. One is because professors have more “latitude” in how they grade, especially with essays. Another reason could be that later courses in humanities don’t build on earlier classes like science and math do.

One of the major points of the study was its look at the effectiveness of student evaluations. Although the evaluations can accurately predict the performance of the student in the “contemporaneous” course — the course in which the professor teaches the student — they are “very poor” predictors of the performance of a professor’s students in later, follow-up courses. Because many universities use student evaluations as a factor in decisions of promotion and tenure, this “draws into question how one should measure professor quality,” according to the report.

Two points are worth noting here. The first is the tiresome point that better ways are needed to evaluate quality teaching. Duh. The other one, though, is a bit more interesting, and confirms something I had suspected (but which is found in math and science but not in the humanities more generally): there is some correlation between taking courses from successful faculty and doing better in later courses that depend on material from earlier courses. This result doesn’t hold for the humanities more generally, but for mainstream philosophy in research departments, I expect the results would be more like math and science and less like other humanities.

It is also worth noting that success in later coursework is couple with another point: less success in the course in question. To the extent that we know that teaching evaluations correlate well with grades (and even better with expected grades), we have a reason to reward successful professors when their grades and teaching evaluations are below the norm!

OK, the last part was intentionally provocative, but there is a part of it that I’d defend. Suppose you have a highly successful researcher. You have to evaluate his teaching. You visit the classroom (with permission!). You check quality of syllabi. You check grading techniques, developing an understanding of the standards used. You attend to availability of office hours, and whether the imposition of grading standards is done in a constructive way. After all that, you find out that grades assigned are lower than average and teaching evaluations suffer a bit. At this point, I’m inclined to do more than discount the evaluations. I’m inclined to apply an inverse rule here.

Rebuttals welcome, of course (as are refutations… and, not being a journalist, I know the difference between rebutting and refuting…).


Evaluating Faculty Quality, Randomly — 6 Comments

  1. Jon,

    In the case you describe in the second-to-last paragraph, why not just stick with discounting the evals, as opposed to applying an inverse rule? Some teachers are hard graders and are genuinely crappy teachers; some teachers are hard graders and are genuinely good teachers; and there are lots of cases in between. And so it seems to me a single rule to cover all of these cases is ill-advised, since I’m not sure that being a hard grader correlates in any significant way with quality of teaching. I’m open to being shown counterevidence here. (As a hard grader myself, I’d like to think that there is a positive correlation with quality of teaching, but I would doubt that there is.)

    One thing I have long considered (having served on too many committees that asked “How should teaching be evaluated?”) is to compare faculty within a single dept to one another, and in particular get some sort of meaure that reflects the hardness of their grading (or perhaps of their expected grading, as seen by students) relative to their evals. We might in this way reward hard graders who nevertheless manage to get relatively good evals — that is, evals that are good relative to other hard graders. And we might also remove one incentive profs have for being easy graders. (Shame on those of us who give in to this temptation.) One drawback of this proposal is that I am unclear on the relation between getting good evals and being a good teacher: the former typically reflects the likeability or entertainment value of the teacher, more than her quality. So this method might fail to reward genuinely good teachers who are hard graders but not particularly entertaining or well-liked by their students. Even so, one who gives tough grades and yet is still liked has something going for her as a teacher, I think.

    Just a thought…


  2. Hi Sandy, maybe discounting is best, but note that your description of the case leaves out all the important background info I included, except the hard grader part. All of the other information points to a very good teacher, I’m assuming. I guess I really didn’t assert that in the paragraph, though. So I now do: all the prior checks show a dedicated and excellent teacher. Now: what to do with the teaching evaluations. Given the background info, including the hard grading part (I don’t mean ridiculous here, but rather a grading pattern that is closer to an 2.0 GPA than to a 3.0 or 3.1 or 3.2 or whatever is average for courses in the department of the same level), I’m suggesting that lower evaluations might be just what one should expect for a really good teacher, and higher evaluations an anomaly. A typical, decent department chair would simply ignore the weak evaluations, I’m sure, but it looks to me like something stronger is supportable.

    Oh, and I have a joke for you, from Jay Leno: “so we find out now that Hillary spent 212M, only to end up finishing second. 212MILLION!!! When is the last time anyone spent that much money to come in second???? Oh, right, … the YANKEES!”

  3. “This result doesn’t hold for the humanities more generally, but for mainstream philosophy in research departments, I expect the results would be more like math and science and less like other humanities.”

    I’m curious about why you think this. Look at the two explanations of this difference between the humanities and the sciences offered in the piece you quoted:

    “One is because professors have more “latitude” in how they grade, especially with essays. Another reason could be that later courses in humanities don’t build on earlier classes like science and math do.”

    In both of these respects, philosophy courses are more like other humanities courses than they are like science courses, aren’t they? First, there is some degree of latitude in grading philosophy papers, when you compare them with exams in math or chemistry or physics. I’ve never graded papers in any humanities field outside philosophy, but I don’t see why there should be any more latitude in grading papers elsewhere. The second factor (which I suspect is more important than “latitude”) is that in the sciences, you take classes that have other classes as prerequisites. I mean, just think of all the science courses with numbers in their titles (e.g., Calculus III, Physics II, Organic Chemistry I). This is a lot less common in philosophy. You’ll sometimes have classes which have as a prerequisite at least one class in philosophy, but it’s rare for a philosophy class to admit only students who have already taken, say, philosophy of language.

    Based on this, doesn’t it seem more likely that the effect seen in the sciences won’t show up in philosophy any more than it does in other humanities?

  4. I have to be a bit careful what I say here, because I don’t want to make comments about current and former colleagues. So let me be quite general and vague. It’s worth paying attention to how well students do in further classes after a first class. My own experience is that there are significant disparities, depending on the instructor in the first course. What matters here isn’t course content so much (though sometimes that is true, as when one tries to teach philosophy of language to students who don’t know any first-order logical theory), but rather how to do philosophy. As a general rule, I think taking first courses from higher quality philosophers makes a significant difference in the second course, and taking a first course from those pretty close to not having a clue is pretty damning for the second course.

  5. In the case you describe I can agree to this much — if all other factors point to excellent teaching, then we should GREATLY DISCOUNT teaching evals. But even then I wouldn’t disregard them entirely, let alone apply an inverse rule. Assuming that evals track a teacher’s comedy and entertainment value, who says comedy and entertainment can’t go with successful pedagogy among those who have given the other indications of good teaching?

    PS I make it a policy not to engage anti-Yankee smack that arrives in July. October is Yankees’ baseball month; let’s talk then.

  6. Sandy, yes, I recall conversations the last several Octobers. Enjoyed them very much. If it were a stock price, the smart money would be shorting the pinstripes…

    On the teaching evaluations, I think there’s quite a bit of data available about what they do track, though I don’t know of any studies that try to see if they track comedy and entertainment. So if C&E could be substantiated with real data, I’d move to discounting the data for purposes of, say, merit raises. I’m, c’mon Sandy, you want me to give Jennifer higher raises because her jokes are better than yours??? 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *