Hirsch Numbers for Departments

UPDATE: Because of high interest level, I’m moving this to the top. See also here for notes about the table below.

SECOND UPDATE: I’ve created a page with just the ranking tables, both the ones in this post and another one from the comments, for those who just want to see the results of the exercise. You can click on the “Department Hirsch Number Rankings” in the Pages box in the right column, or just click here.

I’ve used the Hirsch number to gather more data, this time on Leiter-rated departments, using the faculty lists used by those who ranked departments for Leiter last time. I’ll put the data below the fold, but first a word or two about the numbers. First, I think there are lots of problems with the data at present, but it is becoming clear to me that administrators will want such information as the methods of generating it become easier to use and more reliable. So it is worth knowing how philosophy departments fare on this score. There is also an advantage here, since a department that is slipping in the rankings might use such information as part of a case for new appointments; moreover, addressing such concerns in a department will require hiring senior and productive scholars, which is good for the profession as well. (A dominant concern of mine has been the immense power educational institutions have over faculty, and data that leads to greater mobility for senior faculty will force institutions to pay for quality departments.) There is an added benefit as well for departments not rated by the Leiter report: such departments can compare data for their own department to see how they fare compared to the departments that are ranked.

The data at present are not fully reliable, and I caution in advance about the problems. As before, I used Harzing’s Publish or Perish, which searches Google Scholar for the data to generate a Hirsch number for each scholar. Google Scholar has been subjected to some scrutiny by social scientists and the problems with it are discussed here, and Keith’s worries here are certainly legitimate and worth considering as well. The current wisdom seems to be that GS is far from accurate, but not inaccurate enough to render the data entirely without probative value.

There are also known issues with the h-index itself, as discussed here. It is, however, a measure that is growing in popularity and, I expect, will play an increasing role in administrative thinking about departments.

So, with all the limitations and caveats about the data, here are the results.

The table has three sets of rankings. The first two columns order departments by departmental mean and the middle two columns order departments by departmental median. The final two columns order departments in terms of a mean on data that is restricted in three ways. First, since the mean for all scholars measured was 5.14, and the median 4, the data was restricted to include only faculty with Hirsch numbers greater than 4. Second, a department might get a high mean by having exactly one highly cited scholar, so the data was restricted further by eliminating the highest h-value from each department list. Finally, since it takes a critical mass of productive scholars to make a good department, I looked for departments with at least 6 scholars with an h-index above the median for all scholars. If a department did not have 6 such scholars, then the mean reported in the last column is the mean for the department minus the highest h-value scholar. If a department had at least 6 such scholars, then the mean is calculated on the restricted data just described. The resulting ranking is thus intended to favor departments with a critical mass of productive and reputable scholars, without depending on one particular star in the department to obtain a high ranking.

Since I now have a database, if there is some other way of running the numbers that might be of interest, let me know.

# Department Mean # Department Median # Department Rewards
1 Rutgers  10.38 1 NYU 8.50 1 NYU 11.06
2 NYU 9.15 2 Rutgers  7.00 2 Berkeley 10.88
3 Berkeley 8.68 2 MIT 7.00 3 Stanford 10.50
4 MIT 8.55 2 Duke 7.00 4 Rutgers  10.41
5 Miami 7.79 5 CMU 6.00 5 Pittsburgh 10.25
6 Stanford 7.38 5 UNC 6.00 6 Duke 9.56
7 Duke 7.09 5 Arizona 6.00 7 Maryland 9.50
8 Pittsburgh 6.58 5 Umass 6.00 8 Columbia 9.14
9 Arizona 6.48 5 Miami 6.00 9 Chicago 8.90
10 Princeton 6.35 10 Stanford 5.67 10 Harvard 8.86
11 Chicago 6.33 11 UCSD 5.00 11 MIT 8.71
12 Harvard 6.31 11 Wisc 5.00 12 UCSD 8.67
13 UCSD 6.30 11 Michigan 5.00 13 Princeton 8.55
14 Michigan 6.18 11 Princeton 5.00 14 Arizona 8.42
15 UNC 6.09 11 Chicago 5.00 15 CUNY 8.13
16 Maryland 5.95 11 Berkeley 5.00 16 Miami 8.06
17 CMU 5.94 11 Syracuse 5.00 17 Michigan 7.82
18 Umass 5.93 18 Virginia 4.50 18 Notre Dame 7.79
19 UCLA 5.88 19 Cornell 4.00 19 Texas 7.75
20 Wisc 5.38 19 Rochester 4.00 20 Umass 7.63
21 Brown 5.31 19 Georgetown 4.00 21 UNC 7.07
22 Rochester 5.23 19 Fl. State 4.00 22 Irvine 7.00
23 Cornell 5.00 19 Maryland 4.00 23 Indiana 6.63
23 Columbia 5.00 19 Minnesota 4.00 24 CMU 6.60
25 USC 4.89 19 Pittsburgh 4.00 25 Georgetown 6.54
25 Texas 4.89 19 Indiana 4.00 26 Wisc 6.50
27 Fl. State 4.80 19 Texas 4.00 27 Syracuse 6.09
28 Georgetown 4.50 19 Irvine 4.00 28 Riverside 6.00
29 Syracuse 4.43 19 UCLA 4.00 29 Minnesota 5.78
29 Irvine 4.43 19 Riverside 4.00 30 UCLA 4.94
31 Indiana 4.39 31 Harvard 3.63 31 Rochester 4.67
32 Virginia 4.36 32 Penn 3.50 32 USC 4.28
33 Penn 4.34 33 Wash U StL 3.00 33 Penn 3.97
34 Notre Dame 4.33 33 Ariz St 3.00 34 Virginia 3.92
35 Wash U StL 4.31 33 Brown 3.00 35 Wash U StL 3.87
36 Washington 4.25 33 Emory 3.00 36 Fl. State 3.86
37 Rice 4.22 33 Johns Hopkins 3.00 37 Washington 3.79
38 Northwestern 4.19 33 Northwestern 3.00 38 Cornell 3.69
39 Davis 4.18 33 South Fl. 3.00 39 Yale 3.63
40 Yale 4.12 33 USC 3.00 39 Davis 3.63
41 Riverside 4.11 33 Yale 3.00 41 Brown 3.50
42 Minnesota 4.10 33 Colorado 3.00 42 Temple 3.44
43 Temple 4.00 33 CUNY 3.00 43 Colorado 3.42
44 Santa Barbara 3.91 33 Columbia 3.00 44 Ariz St 3.41
45 Ariz St 3.83 33 Purdue 3.00 45 Rice 3.38
46 Colorado 3.80 33 Rice 3.00 46 Northwestern 3.33
47 South Fl. 3.67 33 Iowa 3.00 47 Santa Barbara 3.20
48 CUNY 3.54 33 Notre Dame 3.00 48 South Fl. 3.07
49 BU 3.52 33 Ohio State 3.00 49 Emory 3.00
50 Johns Hopkins 3.45 33 Santa Barbara 3.00 50 Ohio State 2.87
51 St. Louis 3.30 33 Davis 3.00 51 Iowa 2.86
52 Emory 3.25 33 Temple 3.00 52 BU 2.75
52 Iowa 3.25 33 Washington 3.00 53 St. Louis 2.63
54 Ohio State 3.13 54 BU 2.00 54 Illinois 2.62
55 Illinois 3.07 54 Ill/Chicago 2.00 55 Johns Hopkins 2.60
56 Ill/Chicago 2.94 54 Connect. 2.00 55 Missouri 2.60
56 Missouri 2.94 54 Illinois 2.00 57 Florida 2.50
58 Florida 2.88 54 St. Louis 2.00 58 Purdue 2.45
59 Purdue 2.71 54 Florida 2.00 59 Ill/Chicago 2.38
60 Connect. 2.61 54 Missouri 2.00 60 Connect. 2.35
61 S. Car.  2.29 61 S. Car.  1.00 61 S. Car.  2.06


Hirsch Numbers for Departments — 30 Comments

  1. Pingback: Certain Doubts » Notes on Last Post

  2. I’d be very interested to see the means of the departments including all faculty (or all non-adjunct faculty), not just the mean for those with H-numbers over 4.

    There’s something unclear about your description of your method. You say you’ve restricted the data to scholars with an H-number of at least 4, but then sometimes the averages are below 4. Is that because you added up the h-numbers of those with numbers above four and then divided that figure by the total number of people in the department? I assume that’s what’s happened, but I would appreciate a clarification.

    Another interesting figure to calculate would be the means and medians of the tenured staff at the various departments, since h-numbers will be a poor indicator of research quality of many people who have only been out a few years. Of course it’s often not straightforward to tell from department webpages who are tenured and who are not.

  3. Daniel, only the last column is adjusted for an H-number over 4. The first two columns are for all faculty.

    On the last column, if a department lacks a critical mass (e.g., more than 6 faculty with an H-value above 4), then the mean is the mean for the entire faculty minus it’s “star”.

    Exactly right on the tenured/untenured point. The last column isn’t a bad surrogate for that, however, since nearly all untenured faculty have low h-numbers. There are some exceptions, however.

  4. I’ve been getting more interested in metric measures since there have been rumblings about the UK government using some kinds of metrics in the thing that will replace the RAE. I thought it would be interesting to look at the British Leiter-ranked department scores, though the prospect of caluclating one for all the Oxford philosophers put me off! For what it’s worth though, here are my calculated Nottingham scores: mean 5.67, median 3.5, Kvanvig Special 9.16. I was surprised and pleased that only 7 US Leiter-ranked departments did better than us on the last score, though I’m still not sure what the Kvanvig Special score shows, if anything.

  5. Oops, I forgot that our staff list doesn’t include all our faculty under the heading “faculty” (plus our Head of department). If you include our teaching fellow and postdoctoral fellows, that will make a difference to the mean and median of the Nottingham numbers, though not to the third “Kvanvig number”. (I presume emeriti and “special professors”, who are basically frequent department visitors, aren’t to count.)

  6. Very nice, Daniel! Actually, looking at the Oxford list is what prompted me to stop after the U.S. numbers. It does provide a benchmark for those who want to compare.

    Yes, those rumblings are part of what interests me as well. There are reasons to balk at such a replacement at this point, but I think the handwriting is on the wall about where things are headed…

  7. Daniel, a further thought on the Rewards column (a.k.a., Kvanvig special). The idea is that, when I recommend departments to students who just want generic excellence in the department (rather than students with specialized interests), I caution them to avoid two things: departments that rate highly because of one superstar, and departments that lack a passel of reputable scholars. I caution against the former because one may choose to work elsewhere in philosophy, or one might not get along with that one person, and then one would likely have been better off elsewhere. I caution against departments lacking a critical mass of reputable scholars because the dead wood in the department has to play too strong a role in the graduate curriculum in such departments, to the detriment of students. It is vague how many faculty are needed for a critical mass, but I selected anything greater than 6 (and measured “reputable” in terms of an h-number higher than the median). Arbitrary to a certain extent, but perhaps not wholly indefensible.

    I thought about an alternative: require a certain percentage of the department to be reputable scholars. And, maybe as well, lower the standards for reputable? These two factors interact, and if anyone wishes to know what happens with different factoring, I’d be happy to run the numbers.

  8. Pingback: Certain Doubts » How Does Google Scholar Work?

  9. Rather than a median or mean of the Hirsch numbers of the members of a department, what I would have thought was the “Hirsch number for” a whole department would be the highest number n such that the top n papers (in terms of citations) of all the papers written by the various (current) members of the department all have at least n citations. You take all the papers written by the various members of the department, arrange them from most cited to least cited, and then count down to calculate the department’s h-number.

  10. Great idea, Keith, and no reason to expect that data to track any of the data here. If I had a program that allowed multiple author searches, I’d get the data, but I don’t. I bet within 10 years, though, it will be easy!

  11. One possible negative about Hirsch numbers by your method, Keith, is that larger departments would automatically have an advantage. I don’t see that you were proposing this method, but rather only commenting on the sense of “Hirsch number for”, but adopting a method that favors larger departments in this way would be controversial, at least.

    Of course, on the other side, the method I used eliminates any benefit to size, and perhaps that’s a defect as well. The Rewards ranking compensates for that, however, since it requires a critical mass of reputable scholars.

  12. You could see that as a positive, as compared with using the median or mean of the h-numbers of the members of the department. Using either of the latter, you get this odd result: Whenever a department adds someone who’s h-number is lower than their current median/mean, they hurt their score. But certainly, adding someone (e.g., someone younger, someone who works in an area you don’t have covered yet) with that feature often (at least; would it be usually) strengthens a department (& cutting someone in that condition often hurts). So it seems strange and bad to employ a method by which all such additions hurt your score. Viewed that way, the advantage would go to just calculating the h-number for the whole department, b/c that method doesn’t have the bad feature just mentioned.

  13. Yes, that’s right, Keith. The sad point here is that any method of ranking gets noticed by the population under study, and their behavior can adapt to maximize their ranking. So if you notice, as many have, that departments strong in language, epistemology, metaphysics and mind do well on the Leiter report, that can lead departments to design new appointments to favor candidates in such areas.

    Comparatively, though, using means and medians in the way done here favors departments that, when they add, they add reputable scholars rather than untested ones. Adding junior people does strengthen a department in ways not measured here, but it shouldn’t be thought to add to the reputational status of a department in terms of scholarship.

    All that said, though, the point is well-taken that the method used here, and any other method that we might use in addition, have only limited value at best. The idea of a Grand Unified Metric is a nice ideal, but that’s all.

  14. Manipulation wasn’t my real worry. Whether or not anyone would ever be tempted to manipulate the system, I was taking the fact that the median/means measures would always be hurt by adding someone whose h-score is below your m/m to be evidence that this isn’t a good way to measure. Say we have two departments, both with a solid core of seven philosophers, about equally good. But dept. A also has five other philosophers whose h-scores are lower than those of their top 7. Still, they are good scholars, who cover areas not covered by their top 7, and we may suppose they have good h-scores themselves — just not as good as the top 7. There’s no worry about manipulation. A & B are both locked in place: neither will be doing any hiring or firing in the foreseeable future. But we want to compare them. I think we should avoid measures that say B is better — especially since B would likely add the five extras to their own faculty if given the opportunity to & consider that an improvement.

    I don’t think we should shy away from methods on which bigger departments tend to do better. Generally, bigger is better. A larger faculty will tend to cover more areas, and cover them better. That’s especially important for graduate programs, where it helps students to have an advisor (or two, or three) who really know the area a student’s dissertation is in. Students usually don’t know what they’ll be dissertating on when they enter a program. It’s good to be in a program where there are lots of viable options. Of course, some students thrive in the environment provided by smaller departments. Others are more at home in larger departments, where you get the needed critical mass for certain professional activities (reading groups, etc.). But I do think that in general bigger tends to better. Those in smaller departments may argue for the relative advantages of smaller departments, but few of them, I think, would turn it down if they were offered slots to become bigger, if they thought they could get good people, even if those good people were a bit below their current median/means.

    Plus, as a strategic matter, I don’t think we should be getting into the business of advocating measures that would encourage *less* hiring.

  15. Good point, Keith. So what we want is a measure that finds out how reputable the scholars are and then how many of them there are. If we adopted the measure of “reputable” above, then we could generate a ranking of this sort just by summing all the scores of those above an h-value of 4. (Notre Dame and Oxford are going to look great by this measure!) On this way of measuring, here’s what the ranking order looks like:

    Notre Dame
    Wash U StL
    Fl. State
    St. Louis
    Ariz St
    Ohio State
    S. Car.
    Santa Barbara
    South Fl.
    Johns Hopkins

  16. Keith raises a point and glances by another that I wish to address.

    The first concerns whether a population of scholars would be tempted to change citation practices to optimize a score on a metric used for professional advancement. There is ample evidence that citation practices in the sciences changed following an increase in the use of impact measures like those under consideration. If professional mobility and advancement in philosophy become as closely linked to citation metrics as they already are in many of the sciences, I see no reason to think that philosophers would resist changing citation practices to better their professional position. I think this is what Jon meant by a population manipulating a metric, since it is possible (likely, even) that a population will adapt to a measure so well that it shreds what probative value it may have had in the first place.

    The second point concerns the value of Jon’s data, and Keith is openly skeptical. I’m surprised by this, since I’ve read his vigorous defenses of Leiter’s system on this blog and I see no good reason for being strongly skeptical of this system while being strongly enthusiastic for the other, particularly given well known reservations that have been raised since the PGR started. But nevermind that. Instead, I’d like to suggest that there is something very constructive going on in Jon’s project.

    I think that it’s a good thing to see how the metrics on his data flip some items around but to notice too that they do not scramble the data entirely. Despite the worries, there are rough correlations between his data and Leiter’s survey, particularly at the ends of the scale, and there also appear to be some corrections for bias that may be due to how the rating panels are selected.

    These ranking schemes are much more coarse grained than they are (often) taken to be. This has been Richard Heck’s point about the Leiter rankings, for instance. Jon’s data is a good illustration of this very point.

    I think Jon is to be credited both for demonstrating the variability in his dataset and for going to great lengths to identify short-comings of the methods he is using. He’s also to be credited for warning end-users of the hazards he’s found in his work with the dataset, and for generously allowing open access to it.

  17. I’ve heard that some departments have used the Leiter rankings to make a case for more resources. But, suppose one had a physicist or sociologist dean for whom citation numbers mattered, but the Leiter rankings not so much. An h-index type measure or ranking of departments might be appealing to some deans.

    And wouldn’t an h-index ranking of departments spread the joy more widely? Couldn’t the h-index for departments be calculated for more departments than are the Leiter Rankings? So, more departments could ply their deans for more resources.

  18. Being from Berkeley, I notice some serious variability in the rankings of some departments on these three versions of the metric. I suppose this must mean that Berkeley has a few people with very high numbers, and the rest of them generally lower than those of most people at the other highest-ranked departments? I guess I’m not quite sure what these numbers might be telling us, if a school like Berkeley can jump around so much based on slight tweaks of the way the score is compiled.

  19. Gregory: I, too, applaud Jon’s efforts, despite being openly skeptical. Those two aren’t even in tension, since I think Jon is just trying to run some numbers to start to see where an effort to evaluate philosophers and departments along these lines might go. I think he, too, is skeptical about the present state in which he has this effort: the whole idea is to try to figure out how (& whether, even) to eventually do it down the line a bit.

    I was and am a defender of the PGR (though I don’t *think* any of the defense, at least in my hands, occurred in this blog: but I could well be misremembering here). I don’t think my reasons for skepticism about the current state of this way of evaluating has that much in common with the worries that were voiced about the PGR. I’ve been looking into Google Scholar some more, and it’s really looking like a *very* bad source — in a way that doesn’t just “come out in the wash,” but systematically favors some philosophers and whole departments over others in ways not justified by different actual impacts of their work, even when they’re working in the same areas. Hopefully, I’ll be able to write more about this soon, though things are very busy. At any rate, because GS — at least as it currently works — is looking to me so hopelessly horrible as a source, I think the most interesting questions here concern how citations results can best be processed *once we have a reasonably decent source for citations*.

  20. Kenny, as you might expect, Searle’s h-index is among the highest in philosophy, thereby raising the mean for Berkeley higher than either the median ranking or the critical mass ranking. But Berkeley doesn’t actually move around that much: #3, #11, and #2. For wilder swings, see CUNY, for example!

  21. If readers will follow the link in the post to Harzing’s website about the difficulties with Google Scholar, Keith’s worries about the source are discussed at length there. In some studies, only 30% or so of actual citations show up there. At the same time, however, the correlations between using GS versus ISI or Web of Science to rank scholars is quite high. Here’s a quote from the website about this point:
    “At the same time both sources (Web of Science and Google Scholar) have been shown to rank specific groups of scholars in a relatively similar way. Saad (2006) found that for his subset of 55 scientists in consumer research, the correlation between the two h-indices was 0.82. Please note that this does not invalidate the earlier argument as it simply means most academics’ h-indices are underestimated by a similar magnitude by Web of Science. Meho & Yang (2007) also found that when Google Scholar results were added to those of Web of Science and Scopus separately its results did not significantly change the ranking of the 15 academics in their survey. The correlation between Google Scholar and Web of Science was 0.874, between Google Scholar and the union of Web of Science and Scopus 0.976.”

    It remains possible that all of these citation sources are bad, but we do know that there isn’t a lot of correlation between citation lists produced by the different sources. The noticed high correlations are thus quite unexpected, and suggest that we’d get roughly the same rankings if we used the more cumbersome sources for data. The high correlation may provide some evidence, when combined with the lack of commonality among citation sources, that the rankings are tracking differences in actual citations as well. How good is that evidence? Well, maybe Branden can calculate it for us!

  22. K, yes it’s true that some use the Leiter Report in this way, and legitimately so in my opinion. I think these numbers could be used in the same way.

    But I’m tired of gathering data! So the task of doing it for other departments will have to fall on someone else. There is a slight problem here, however, since I eliminated citations of non-scholarly work (e.g., some webpages are actually counted as having citations to their credit, such as my own homepage!), and if someone else calculates the numbers without using the same practices that I used, the comparisons will be undercut somewhat.

  23. Jon,
    I didn’t mean to propose that you look at all departments. In truth, it might well be in the interests of some of the non-Leiter-ranked departments to explore the use of the h-index for departments.

  24. K, yes, that’s exactly right. There’s a tendency for every department to overrate itself, and so many feel slighted for not being rated by the Leiter Report. This may be a way to confirm their suspicions. Or correct the bias. 🙂

  25. I figured Searle was raising the mean quite high. I wasn’t as sure how he would affect the “Kvanvig Special” rating though, since he would be dropped as an outlier, right?

    Following up on Keith’s first comment at #9, another potentially interesting (but much harder to compute) version of a departmental Hirsch number would be to count papers written by people while they were at the department, whether they’re still there or not.

  26. Right, Kenny, he gets dropped for the Kvanvig special. Maybe another, easier to generate, ranking is just the sum of all the h-values in the department. If one compared that ranking with the ranking in comment #15 above, you might take the results to give some idea of how much dead weight a department contains. But then again, you probably didn’t need rankings to tell you that!

  27. Pingback: Certain Doubts » Hirsch Number Impact of Known Faculty Moves

  28. Pingback: Certain Doubts » Complete Rankings of Philosophy Departments Based on Hirsch Numbers

  29. Pingback: Hypatia and Hirsch Numbers « Feminist Philosophers

  30. Pingback: ISI Web of Science | the phylosophy project blog

Leave a Reply

Your email address will not be published. Required fields are marked *