Robert Morse, the lead designer of U.S. News & World Report’s rankings methodology, speaks at the professional conference for college data-crunchers every year. And every year, attendees say, his workshop is packed.
“Every time I’ve gone to that session, it’s been standing-room-only and people leaning in the door,” said Jeffrey A. Johnson, director of institutional research and effectiveness at Wartburg College. Conference-goers always ask tough questions, said Todd J. Schmitz, assistant vice president for institutional research for the Indiana University campuses. But Morse’s audience is rapt: “He has this room of 300 people hanging on his every word,” Johnson said.
The scene captures the complicated relationship between colleges’ data submitters and U.S. News, the best-known college-ranking system in the United States. Many resent the time and oxygen the ranking takes up. After The Chronicle asked to interview her, Christine M. Keller, executive director of the Association for Institutional Research, conducted an informal poll of the group’s members about their views of U.S. News. One major theme: Answering the magazine’s survey requires too many resources, a situation they see as taking away from internal data projects that contribute more to student success than rankings do. Yet they know that responding to the survey is an important part of their jobs, and often campus leaders are paying close attention.
Recently that relationship has undergone renewed scrutiny, as rankings-data controversies have piled up. First, a former dean of Temple University’s business school was sentenced to 14 months in federal prison for leading an effort to inflate statistics his school had sent to U.S. News. Then a Columbia University mathematics professor publicized his belief that his institution is sending inaccurate data to Morse and his colleagues, a contention Columbia has denied. Finally, the University of Southern California pulled out of the rankings for its graduate program in education because it discovered it had submitted wrong data for at least five years.
The headline-grabbers are the latest in a decades-longhistoryof scandals about colleges gaming U.S. News and unintentionally sending inaccurate data to it. Every few years, it seems, another incident comes to light. Criticismsoftherankings have also been longstanding, but there’s been some fresh attention since the popular journalist Malcolm Gladwell covered them on his podcast last year.
Willis Jones, an associate professor at the University of Miami who studies higher-education leadership, has noticed more of a social-justice bent to rankings criticisms lately. Increased societal awareness of historically Black colleges and universities highlighted that rankings are “one of the many things that were creating disparities among HBCUs versus predominantly white institutions in state funding and things like that,” he said.
Another observer, Akil Bello, director of advancement for the National Center for Fair and Open Testing, an advocacy group known as FairTest, has noticed more critiques of rankings methodologies making it into the mainstream. “There are some cracks being created in the belief that there is an objective foundation to the creation of the rankings,” he said.
As the college staffers typically responsible for gathering and submitting the high-stakes data, institutional researchers are on the front lines of this much-scrutinized process. And while outright lying may be relatively rare, there’s always human error, plus ample room for interpretation in the U.S. News questions. That ambiguity can create incentives to finesse the data in a way that makes one’s institution score better in the magazine’s rubric.
In 2018, after the Temple data problems became public, “eight or 10” higher-education clients of the audit firm Baker Tilly asked for reviews of their processes for submitting data to U.S. News, accreditors, the federal government, and other requesters. “They didn’t want to be in a position to end up like Temple,” said Adrienne Larmett, a senior manager there.
“Not one of them had a clean audit,” Larmett said. Colleges were making unintentional mistakes, often as a result of software systems not working well together, the timing of data pulls (at what point in the year enrollment is counted, for example), and human errors in data entry.
“High risk” statistics, where Baker Tilly auditors often saw problems, included the number of applicants, admitted students’ test scores and GPAs, and faculty-to-student ratios. Test scores can be problematic if colleges rely on numbers shared by applicants rather than by testing companies. And who counts as a faculty member or a student can be defined in many ways, depending on who’s asking.
The differences between what colleges reported and what Baker Tilly found were generally small, Larmett said. But you never know what will be enough to give an institution a lower or higher ranking than it deserves, she said, given that U.S. News doesn’t disclose exactly how it weights survey answers in its rankings.
Even with perfect quality control, two institutions may still count the same number in different ways.
“There’s this overarching tension when you have any type of survey, ranking, or data gathering, where you’re trying to capture the universe of higher education,” Indiana’s Schmitz said. “You’ve got a panoply of different types of institutions, and yet you’ve got one survey instrument and set of definitions that try to be at the same time sufficiently vague and sufficiently specific, so that institutions can see themselves in these survey questions and they’re not totally off the rails.”
“At the same time, there is a lot of interpretation that happens with the U.S. News & World Report survey questions,” he added. “It is up to folks like myself in the profession, who know what the gold standard should be.”
The U.S. News survey appears to try to provide plenty of guidance. The 2022 main survey, for example, devotes several paragraphs to defining “faculty” and “class section,” concepts that feed into faculty-to-student ratios and average class sizes, both influential factors in a college’s ranking.
Nevertheless, Schmitz said, people could interpret those questions differently. At one point, the survey asks for the number of faculty members who “teach virtually only graduate-level students.” If there are one or two undergraduates in a class, he said, does that count as “virtually only” graduate students? What if the undergraduates are auditing the class and aren’t receiving credit?
When and how to measure class sizes bring yet more ambiguity. “You can defensibly use the start of the semester or the end,” Schmitz said. “You could also artificially limit section seating caps.” U.S. News calculates a score for class sizes using the number of classes that fit into different buckets, including how many classes have fewer than 20 students. Thus a seating cap of 19, rather than 20 or higher, for some classes could raise a college’s score.
One question of interpretation loomed large in the analysis by Michael Thaddeus, the Columbia professor who challenged his institution’s place in the rankings. Columbia classifies patient care provided by medical-school faculty members as instructional spending, a decision the university defends on the grounds that the professors may be training students while seeing patients. Still, it’s unusual in the field to consider such expenses as instructional, said Julie Carpenter-Hubin, a former assistant vice president for institutional research at Ohio State University, who retired in 2019.
The inevitable gap between guidance and practice leaves a lot of responsibility on data-gatherers’ shoulders. “You can make those decisions well and defensibly, or you can make them in ways that are indefensible,” Wartburg’s Johnson said. “What you can’t do is simply fall back on the rules and say, ‘Yep, the rules tell us exactly what to do,’ because they don’t.”
The Chronicle spoke with institutional researchers at two large public universities and two small, private liberal-arts colleges, to get a sense of what it’s like to make those high-stakes decisions. Most said that while they personally had not experienced pressure to report positive data, they had heard of others who had. That kind of within-the-rules-but-not-the-spirit massaging is more common than outright fraud, they believe.
“I don’t want to say that anybody would ever pressure folks to misreport,” Carpenter-Hubin said. Instead there can be pressure to “find the best possible way of saying something.” (She emphasized that was not the case at Ohio State; she was speaking generally, she said.)
They are imposing their framework for how colleges and universities should be ranked on everyone else.
“The incentives are in place for everyone to want to put their best foot forward to describe their institution in the most favorable light” to U.S. News, said Keller, from the Association for Institutional Research.
One interviewee, however, said she had little experience of the kind of data shenanigans that make the news.
“This is probably a point of privilege,” said Bethany L. Miller, director of institutional research and assessment at Macalester College, “but I don’t know a lot of people who are nervous about reporting data.”
She pointed to counterpressures at work to keep people honest: the reputational hit colleges take when data misreporting comes out; fines and potential prison time for misreporting to the U.S. government, if not to a private company like U.S. News (though the Temple case shows that lying to U.S. News can bring prison sentences too); and the Association for Institutional Research’s ethics code.
The institutional researchers The Chronicle interviewed said they thought their peers did the best they could, had quality-control checks in place, and made only small mistakes, if any. Institutional researchers’ beef with U.S.News isn’t the data’s integrity. “It’s more the fact that they are imposing their framework for how colleges and universities should be ranked on everyone else,” Keller said. “In the end, the data is likely not perfect, but it’s how the data is being used that is the issue for me and a lot of my IR colleagues.”
Meanwhile, despite the criticism, the rankings remain as important as ever to some audiences. College marketing teams still tout high rankings, and boards of trustees still fret when their standings fall, Jones, of Miami, said. The share of first-year students at baccalaureate institutions who say that “rankings in national magazines” were “very important” in their choice of college has hovered at just below one in five for more than a decade, although it fell to about one in seven in 2019, the latest available data.
And the assumptions behind the rankings still shape the way people talk about colleges. “How do you unring the bell of the socially accepted rankings?” FairTest’s Bello said. “That’s the biggest challenge right now — is that the ‘These colleges are good’ and ‘These colleges are bad’ has entered the ether of the higher-ed admissions landscape.”
On July 1, 2002, I became president of Reed College in Portland, Ore. As I began to fill the shelves in my office with mementos from my previous life as a law-school dean, I could feel the weight already lifting from my shoulders. “I’m no longer subject to the tyranny of college rankings,” I thought. “I don’t need to worry about some news magazine telling me what to do.”
Seven years before my arrival at Reed, my predecessor, Steven S. Koblik, decreed that Reed would no longer cooperate with the annual U.S. News Best Colleges rankings. As a practical matter, this meant that college staff members would no longer have to invest hours in filling out the magazine’s annual surveys and questionnaires. Most importantly, it signaled that Reed would no longer be complicit in an enterprise it viewed as antithetical to its core values. And it would no longer be tempted to distort those values to satisfy dubious standards of excellence.
The fact that Reed had taken this rebellious stance was one of many features that attracted me to apply for its presidency. I took it to be a statement that Reed viewed education as a path to a genuinely fulfilling life, not just a ticket to a high-paying job. The college defined its goal as imparting learning, not just conferring credentials. It measured itself by internal standards of academic integrity, not just external applause.
There is a growing cottage industry of college evaluators, many spurred by the commercial success of U.S. News. I call it the “rankocracy” — a group of self-appointed, mostly profit-seeking journalists who claim for themselves the role of arbiters of educational excellence in our society. It wasn’t just the U.S. News rankings that were incompatible with Reed’s values. Virtually the whole enterprise of listing institutions in an ordinal hierarchy of quality involves faux precision, dubious methodologies, and blaring best-college headlines. To make matters worse, the entire structure rests on mostly unaudited, self-reported information of dubious reliability. In recent months, for example, the data supporting Columbia’s second place U.S. News ranking have been questioned, the University of Southern California’s School of Education has discovered a “history of inaccuracies” in its rankings data, and Bloomberg’s business-school rankings have been examined for perceived anomalies.
Reed College’s rebellion against rankings was a sign that it viewed education as a path to a genuinely fulfilling life, not just a ticket to a high-paying job.
Maintaining Reed’s stance turned out to be more of a challenge than I had realized. Refusing to play the game didn’t protect us from being included in the standings. U.S. News and its coterie of fellow rankocrats just went ahead and graded the college anyway, based on whatever data they could scrape up and whatever “expert” opinions they could sample. Every once in a while, when I saw that U.S. News had once again assigned us a lower number, I would feel those old competitive juices flowing. In moments like that, I had to take a deep breath or go for a walk. And throw the magazine into the trash.
I came by my rankings aversion honestly. In 1989, I became the dean of the University of Pennsylvania’s law school. The next year, U.S. News began to publish annual rankings of law schools. Over the next nine years of my deanship, its numerical pronouncements hovered over my head like a black cloud. During those years, for reasons that remained a complete mystery to me, Penn Law’s national position would oscillate somewhere between seventh and 12th. Each upward movement would be a cause for momentary exultation; each downward movement, a cause for distress.
My admissions dean reported that prospective applicants were keenly attuned to every fluctuation in the annual pecking order. So were my alumni. If we dropped from eighth to 10th, alumni would ask what went wrong. If we moved up to seventh, they would ask why we weren’t in the top five. Each year, Penn’s president would proudly present to the Board of Trustees a list of the university’s schools whose ranking numbers had improved. (She’d make no mention of those whose numbers had slipped.)
During that time, I also served as a trustee of my undergraduate alma mater, Amherst College. By the standards of the rest of the world, U.S. News treated Amherst very kindly, almost always placing it in the top two liberal-arts colleges in the nation. Amherst was far too genteel to boast publicly. But the topic often arose at the fall meeting of the Board of Trustees, right after the release of the latest U.S. News Best Colleges edition. If Amherst came in second, someone would always ask, “Why is Williams College ahead of us again?” I came to understand that, in the world of college rankings, everyone feels resentment, frustration, and anxiety. Everyone thinks they are being treated unfairly, except during those fleeting moments when they sit at the top of the sand pile.
Like many members of my generation, my education in gourmet cooking began by watching Julia Child’s syndicated TV show, The French Chef. Judging by my occasional attempts at haute cuisine, I was not a very good student. But I do remember one important lesson: When you combine a lot of ingredients into a stew, you want to bring out the flavor of each one. You should still be able to taste the bacon and the porcini mushrooms in the beef bourguignon.
Everyone thinks they are being treated unfairly, except during those fleeting moments when they sit at the top of the sand pile.
The art of composing a college ranking is like preparing a stew. You select a group of ingredients, measure each one carefully, combine them in a strict sequence, stir, cook, and serve. If you do it just right, you might end up with a delicious, classic French dish. If you do it badly, you end up with gruel.
The rankings of U.S. News and its followers typically produce gruel. A careful look at the “recipes” for preparing these rankings shows why.
To create its 2022 listings of national universities, for example, U.S. News combined 17 different ingredients, grouped under nine headings (graduation and retention rates, social mobility, graduation-rate performance, undergraduate academic reputation, faculty resources, student selectivity, financial resources, alumni giving, and graduate indebtedness) to produce an overall score for each ranked college, on a scale of one to 100. The data fed into this recipe derive from replies to the magazine’s annual statistical questionnaires and peer-evaluation surveys. Most of the quantitative information is also available from the U.S. Department of Education, but some (such as class-size data or the alumni-giving rate) is not. Since there is often a time lag in federal reports, U.S. News takes some pride in publishing data that are, in at least some instances, more current.
In a practice begun back in 1997, U.S. News adjusts some of the metrics in its formula in an attempt to measure institutional value added. These calculations use proprietary algorithms to estimate the extent to which an institution’s performance on a particular criterion is higher or lower than one might expect, given the distinctive characteristics of the institution and its student body. For example, in addition to calibrating the raw overall graduation rate, U.S. News also includes something called “graduation-rate performance,” to reward institutions, such as Berea College, that achieve a higher graduation rate than might be expected, given the academic preparation of their students.
Other comprehensive rankings have used formulas that are broadly similar to those used by U.S. News. For its 2022 edition, the Wall Street Journal/Times Higher Education rankings employed 15 measures, grouped under four headings (resources, engagement, outcomes, and environment). Some of its factors (graduation rate, for instance) are also used by U.S. News. Several others, such as various survey-based ratings of student engagement and postgraduate salaries, are more distinct. Washington Monthly divides its rankings into three portions, each comprising many factors, while Forbes uses well over a dozen (including an institution’s alumni representation in the Forbes “30 Under 30″ list). Niche, a platform that both recruits for colleges and helps parents and students find the right institutions, surely wins the prize for formulaic complexity, by somehow managing to incorporate over 100 ingredients (via Bayesian methods and “z-scores”) into a single ordinal list of 821 best colleges.
Taken individually, most of the factors are plausibly relevant to an evaluation of colleges. But one can readily see that any process purporting to produce a single comprehensive ranking of best colleges rests on a very shaky foundation.
Problem No. 1: Selection of Variables
How do rankocrats decide what to include or leave out in their formulas? What we call a “college education” has literally hundreds of dimensions that could potentially be examined. While there is widespread agreement about the general purposes of higher education, when it comes to rankings, that consensus quickly dissolves into argument.
Why, for example, does U.S. News look at spending per student, but not endowment per student? Why does it measure faculty salaries but not faculty research output? Why does it calculate graduation rate but not postgraduate earnings? Why do some rankings systems include racial and ethnic diversity, while most ignore it? Indeed, why do some formulas use just a handful of variables, while others incorporate dozens or even hundreds? At best, the rankers give vague replies to such questions, offering no supporting evidence for their preferred variables. Very rarely do they explain why they have left out others, including those that their competitors use.
Problem No. 2: Assigning Weights to Variables
Equally arbitrary is the process of determining what weights to assign to the variables. The pseudoscientific precision of the mathematical formulas used in the most popular rankings is really quite comical. For 2022, U.S. News decreed that the six-year graduation-rate factor was worth precisely 17.6 percent in its overall formula, and the freshman-to-sophomore-year retention rate, exactly 4.4 percent. Washington Monthly somehow divined that its Pell graduation-gap measure (comparing the graduation rate of lower-income Pell Grant recipients with non-Pell recipients) factored in at 5.55 percent of its overall rating, while a college’s number of Pell students receiving bachelors’ degrees deserved a measly 2.8 percent.
U.S. News has long been well aware of the arbitrariness of the weights assigned to variables used in its formulas. In 1997, it commissioned a study to evaluate its methodology. According to Alvin P. Sanoff, managing editor of the rankings at that time, its consultant concluded: “The weight used to combine the various measures into an overall ranking lacks any defensible empirical or theoretical basis.” The magazine evidently just shrugged its shoulders and kept right on using its “indefensible” weighting scheme. As have all the other formulaic rankers, one strongly suspects.
Problem No. 3: Overlap Among Variables
A third problem is the degree of overlap among variables — a condition statisticians call “multicollinearity.” In statistical terms, the ranking formulas purport to use several independent variables (such as SAT scores, graduation rate, class size, and spending per student) to predict a single dependent variable (numerical rank). It turns out, however, that most of the so-called independent variables are, in fact, dependent on each other. A 2001 analysis found “pervasive” multicollinearity in the formula then used by U.S. News, with many pairs of variables overlapping by over 70 percent. For example, a college’s average SAT score (for its entering students) and its graduation rate were almost perfectly correlated.
Why is this a problem? When factors such as SAT scores and graduation rates are collinear, the true impact of either one on colleges’ overall rankings can be quite different from the weighting percentage nominally assigned by the formula. For example, the 2001 study found that an institution’s average SAT score actually explained about 12 percent of its ranking, even though the U.S. News formula nominally assigned that factor a weight of only 6 percent. The SAT statistic had this outsized influence because it directly, and strongly, affected seven of the 14 other variables. For this reason, Robert Zemsky and Susan Shaman argued quite persuasively in their 2017 book that it takes only a tiny handful of variables to explain almost all of the differences in the U.S. News rankings. In other words, many of the factors so carefully measured and prominently featured by the magazine are just window dressing.
Furthermore, most of the criteria explicitly used by U.S. News (and, by extension, most of the other comprehensive rankers) turn out to be heavily dependent on an unidentified background element: institutional wealth. This should be intuitively obvious for the faculty-resources and financial-resources measures. As studies have repeatedly shown, however, the degree of institutional wealth also corresponds directly with the level of entering students’ SAT scores, freshman retention rates, graduation rates, alumni giving, and even peer reputation. A ranking that gives separate weights to each of those factors ends up largely measuring the same thing.
Problem No. 4: the Salience of Numbers
A further problem with the rankocrats’ systems is the outsized impact exerted by the numerical scores that those systems produce. Scholars call this quality “salience” — that is, the tendency of one measure to dominate all the others, simply because of its greater visibility. Taking an example from the 2022 U.S. News edition, we can ask whether the University of California at Berkeley (ranked 22nd among national universities) is really better than its downstate neighbor, the University of Southern California (27th). These two numbers said yes. Yet, when you look at the underlying data (to say nothing of all the qualitative factors ignored by the formula), the only plausible conclusion is that the two colleges, while very different, were equivalent in overall quality. Those colleges’ total scores on U.S. News’s magic 100-point scorecard (82 and 79, respectively) were also almost identical. Berkeley seemed to be superior on some measures (peer evaluation and student excellence), and USC on others (faculty resources and financial resources). Yet there it was, in neon lights: No. 22 versus No. 27 in rank.
As one moves further down the ladder, the numerical differences among the colleges — and surely the real quality differences — shrink to the vanishing point. Ursinus and Hendrix Colleges, two very fine small liberal-arts colleges, received overall raw scores of 58 and 55 from U.S. News. Yet Ursinus was ranked 85th (in a tie) among national liberal-arts colleges, and Hendrix 98th (also in a tie). The notion that, in this case, a student should choose Ursinus over Hendrix simply because of these numerical differences is ludicrous. But, as many scholars have documented, rankings numbers speak loudly, often drowning out other, more edifying ways of assessing an institution’s strengths and weaknesses.
In a 2007 study of the enrollment decisions made by high-achieving students who attended Colgate University between 1995 and 2004, Amanda Griffith and Kevin Rask noted that over half of the surveyed students chose Colgate merely because it was ranked higher than the other colleges to which they were admitted. This deciding factor, they observed, was independent of other measures of academic quality, such as student/faculty ratio or expenditures per student. A 2013 investigation examined the impact of a 1995 decision by U.S. News to increase the number of institutions that were ordinally ranked. Before 1995, colleges that received raw scores between 26th and 50th in its formula were merely listed alphabetically in a “second tier.” The researchers found that when the magazine began assigning a specific number to those additional institutions, they experienced a statistically significant increase in applications, wholly independent of any changes in the underlying quantitative measures of their academic quality.
Problem No 5: Fiddling With the Formula
Compounding the inherent arbitrariness of the rankings’ methodology, rankocrats keep changing it, so as to render comparisons from one year to the next essentially meaningless. Ever since 1983, U.S. News has made repeated alterations in the variables used in its formula, the weights assigned to those factors, the procedures for measuring them, and the number of colleges listed.
Why does U.S. News keep changing its recipe? Many observers accuse the publisher of instituting changes just for the purpose of shaking things up, to generate enough drama to keep readers coming back year after year. Its editors firmly deny that charge. Instead, they typically give rather vacuous explanations for the changes, often citing “expert” opinion. But, unlike academic experts, the magazine’s editors don’t cite the results of peer-reviewed studies to substantiate their assertions.
In fact, it’s not difficult to guess the reasons for at least some of the changes. One can readily explain several adjustments — for example, the belated inclusion of social mobility and college affordability — as responses to widespread criticism of the formula’s blatant wealth bias. Other revisions reflect efforts to discourage cheating. U.S. News has been engaged in an ongoing Whac-a-Mole exercise with institutions bent on gaming their system. Find a loophole, close it. Find another loophole, close that one. Ad infinitum.
Additional alterations may have been made to avoid the embarrassment of implausible results. In the magazine’s first ranking of law schools, Yale finished first, and Harvard wound up an ignominious fifth. That implausibility was quickly corrected by subsequent rankings formulas. Until quite recently, it’s been Yale (first) and Harvard (second) at the top. A more celebrated example involves the ranking of the undergraduate program at the California Institute of Technology. In 1999, the U.S. News statisticians made an obscure change in the way the magazine plugged spending per student into its overall score computation. As a result, Caltech (which spends much more per student than its peers) vaulted from ninth place in 1999 to first place in 2000. Oops! Soon Caltech settled back to its “proper” position in the pecking order, below the perennial top dogs.
The Caltech episode illustrates a related problem: buyer’s remorse. Since a college’s numerical position in the hierarchy can bounce around from year to year, often for reasons that bear no relation to changes in its underlying quality, applicants who rely on those numbers to make college choices can get unpleasant surprises. Imagine an applicant who, in 2000, chose Caltech because it was ranked first in U.S. News, in preference to, say, Princeton (then fourth). A year later, that person wakes up to discover that the two institutions have traded places. By graduation time, Princeton is still first, while Caltech has sunk to eighth.
Problem No. 6: One Size Doesn’t Fit All; the ‘Best College’ Illusion
Just as there is no single best stew, there can be no single best college. It takes real chutzpah to claim that a formula comprising arbitrarily chosen factors and weights, which keep changing from year to year, can produce a single, all-purpose measure of institutional quality. Of course, all of the rankocrats concede this fact and take pains to advise readers to use their numerical listings only as a starting point in the search, not as an absolute method for making decisions. In service to that advice, most publications offer numerous single-dimension assessments in addition to their comprehensive best-colleges lists. And many of them supply tools to help prospective applicants construct even more-personalized intercollege matchups. (Usually for a fee, of course.)
And yet all of the rankers use their best-colleges lists as public-relations bait to hook their audiences. By the time curious readers get to the underlying information and the specialized rankings, they have been told by a seemingly authoritative organization what the correct ordering of colleges is, from best to worst. The unstated message comes through loud and clear: “Berkeley is better than USC. Ignore that relative assessment at your peril.”
What we have, in sum, is a group of popular rankings that simplify the complexity of evaluating a college’s performance by arbitrarily selecting a collection of measures, many of which overlap substantially, and then assigning equally arbitrary weights in order to purée them together into a single offering. The result is a tasteless mush.
This essay is adapted from the author’s forthcoming book, Breaking Ranks: How the Rankings Industry Rules Higher Education and What to Do About It (Johns Hopkins University Press).
0 Comments