Metrics form the basis of college and university rankings. Some metrics confuse and undermine education. Others provide helpful insights and aids to education. This article shows how to use academic metrics without abusing them.
Metrics, as a way of putting numbers to things, come with two faces, like the Roman god Janus (depicted above). On the one hand, metrics can bring precision, clarity, and insight. On the other, they can bring a false sense of security, suggesting that we have measured something well when in fact the metric is misleading and confusing.
Numbers taken by themselves are neither bad nor good. It all depends on what use is made of them. A metric is a way of assigning numbers to things so that bigger numbers mean more of the thing and smaller numbers mean less of it. Like individual numbers, metrics are neither good nor bad. But metrics can be put to good and bad uses.
Whenever we measure something, we are putting numbers to things via a metric. Science and technology depend on measurement and would thus be impossible without metrics. Where would we be without precise metrics for measuring time, mass, length, and energy? How could electric companies stay in business without something like the kilowatt-hour metric?
Metrics are everywhere in the exact sciences. In trying to characterize how metrics can be abused, one might be tempted to say that metrics are safe in the exact sciences but problematic in the social sciences and humanities, where measurements seem less objective. But such a distinction is simplistic. Birth rates, marriage rates, mortality rates, obesity rates, crime rates, reading rates, etc. etc. are all valid metrics by which we help make sense of the human experience.
How does a metric go from being good or neutral to being bad? Consider wait times in a hospital ER (emergency room). If people show up at an ER, they are typically in bad shape and need help sooner rather than later. Long wait times are bad, shorter wait times are better. And so wait times provide a useful metric for the efficiency of care. Or not.
It would be one thing if a hospital found that its average wait time at the ER was, say, four hours, deemed that too long, and as a consequence introduced better intake procedures and hired more staff to lower the wait time. That would be a commendable use of the wait-time metric.
But it would be another thing if the hospital, wanting merely to seem like it was providing better care, kept patients waiting in the ambulance an extra hour before actually bringing them into the ER, thus starting the clock an hour later and thus reducing the average wait time by an hour. This has actually happened. The wait time goes down, but through a ruse that doesn’t help the patient and probably makes the patient’s experience worse (though the hospital will look better on paper).
Metrics become problematic when they go from being merely descriptive to being prescriptive AND where the prescription can be gamed. Prescriptions, especially when stated in broad general terms, are typically fine. A doctor confronted with a patient that is grossly overweight won’t stop at merely describing the patient’s condition but will want to prescribe a treatment for weight loss. If the treatment consists of healthy dietary and lifestyle changes to induce weight loss, all well and good. Who can object?
But how does one characterize healthy dietary and lifestyle changes? What if the doctor is known as a weight-loss guru, and what if the doctor is concerned with advertising that his or her patients are particularly successful at losing weight quickly? And what if the doctor rationalizes that any means of weight loss is to be preferred over keeping on the weight? The doctor may then prescribe harmful drugs that reduce the weight quickly at a cost to the patient’s overall health.
Or what happens if the patient, eager for the doctor’s approval, or needing the doctor’s authorization to engage in certain work (an authorization to be given only if a certain amount of weight is lost), decides to forgo a healthy lifestyle and do destructive things to lose weight (taking high doses of diuretics, becoming bulimic, etc.)? The weight target drawn from the metric may be met, but with harm to the patient.
In the 1970s, psychologist Donald Campbell formulated what became known as Campbell’s Law and economist David Goodhart formulated what become known as Goodhart’s Law. The two laws are essentially identical, and their point is that when metrics become not merely descriptive, and not merely prescriptive, but the basis for rewards and punishments, people will find workarounds to defeat them. These workarounds, or what we now refer to as “gaming,” attempt to obtain the rewards without deserving them and strive to avoid the punishments even though meriting them. In the process, the metric ceases to be a good measure of anything.
In 2018 Jerry Z. Muller, a historian on the faculty at Catholic University, published a book with Princeton University Press on the outworking of these laws. He titled it The Tyranny of Metrics. In it he showed how, in field after field, people fixate on metrics only to lose the benefit that the metrics might bring them. Thus he showed how metrics control and derail the higher interests of philanthropy, business, the military, policing, medicine and above all education (Muller is, after all, an academic).
Perhaps the most effective technique that Muller discusses for gaming metrics (and there are others) is what he calls “creaming.” Just as cream rises to the top and can be skimmed off, cases that make one look good according to a given metric can be “creamed” at the expense of ignoring or banishing the cases that make one look bad.
Somebody on our staff, for instance, was in a school district where the superintendent “shipped off” to other school districts as many of the students with special needs and disabilities as possible so that test scores would rise in his school district and it would be designated as “exemplary.” Too bad for the districts on which he was able to foist his “low-functioning” students and too bad for those students. The gaming of metrics can make for heartless behavior on the part of those who see their livelihoods and reputations as depending on their performance vis-a-vis the metrics.
Muller, in analyzing the gaming of metrics in general, is especially concerned with the gaming of metrics in higher education. As a key instance of such gaming, he takes on the U.S. News rankings and the metrics on which they are based:
Recently I was puzzled to find that a mid-ranked American university was taking out full-page advertisements in every issue of The Chronicle of Higher Education, touting the important issues on which its faculty members were working. Since the Chronicle is read mostly by academics—and especially academic administrators—I scratched my head at the tremendous expenditures of this not particularly rich university on a seemingly superfluous ad campaign. Then it struck me: the USNWR ratings are based in good part on surveys of college presidents, asking them to rank the prestige of other universities. The criterion is of dubious validity, since most presidents are simply unaware of developments at most other institutions. The ad campaign was aimed at raising awareness of the university, in an attempt to boost the reputational factor of the USNWR rankings. Universities also spend heavily on glossy brochures touting their institutional and faculty achievements. These are mailed to administrators at other universities, who vote on the USNWR surveys...
In addition to expenditures that do nothing to raise the quality of teaching or research, the growing salience of rankings has led to ever new varieties of gaming through creaming and improving numbers through omission or distortion of data. A recent scholarly investigation of American law schools provides some examples. Law schools are ranked by USNWR based in part on the LSAT scores and GPAs of their admitted, full-time students. To improve the statistics, students with lower scores are accepted on a “part-time” or “probationary” basis, so that their scores are not included. Since the scores of transfer students are not counted, many law school admissions offices solicit students from slightly lower ranked schools to transfer in after their first year. Low student to faculty ratios also contribute to a school’s score. But since those ratios are measured during the fall term, law schools encourage faculty to take leaves only during the spring term. These techniques for gaming the rankings system are by no means confined to law schools: much the same goes on at many colleges and universities.
Here at AcademicInfluence.com, we’ve devoted an extensive article to the gaming of the U.S. News academic ranking metrics, demonstrating how the U.S. News approach to academic rankings is beyond repair (see our article ”College Rankings Held Hostage“). But that raises the question whether our approach at AcademicInfluence.com is better and can avoid the abuse of metrics that has become hardwired into the academic ranking business.
AcademicInfluence.com’s approach to college and university rankings is able to avoid the abuse of academic metrics. In fact, our approach is hardwired to make such abuse impossible. That’s because our influence-based metric draws for its input entirely from large publicly available datasets, and these cannot be changed to affect the metric except through extraordinary, and one might even say heroic, means.
Our influence ranking metric, for instance, draws on the academic persons listed in Wikipedia. But changes to Wikipedia entries for these persons must get past editors. And when it comes to school rankings, it’s the joint influence of all the academic persons affiliated with a school that determines its influence. Changing Wikipedia may have some small effect on our influence-based metric, but it will never be appreciable.
Or consider our use of citation data from Crossref and Semantic Scholar. Citation data can change only with newly published articles. But with total citation counts in these datasets in the hundreds of millions, any articles published in recognized journals by would-be gamers won’t amount to more than a drop in the bucket.
AcademicInfluence.com’s influence-based ranking metrics are therefore robust, immune to significant shifts in the face of the small changes to the datasets that would-be gamers might might be capable of introducing. In consequence, our rankings cannot change significantly through the activity of any individual, consortium of individuals, or even entire institutions. This means that our ranking metrics are non-gameable.
Bottom line: Academic metrics can and will be abused if the temptation to abuse them is real and where abusing them offers palpable rewards or else rules out unwelcome punishments. In contrast, our approach to academic metrics simply removes all such temptation. With the temptation gone, our influence-based ranking metrics can be used without being abused.