“Big data is coming for your books.” It’s been nearly a decade since a writer in the Los Angeles Review of Books opened his case against data in the humanities with this line. He was worried about Google Books and the incipience of computational methods of literary study. The threat was overblown — so far. Google Books handed its digitization project to HathiTrust, a nonprofit, and humanists who use computational methods have settled into a small niche.
But in the last few years, university administrators and philanthropists have begun to direct large sums of money to data science, an interdisciplinary field of study. Miami University, in Ohio, received $20 million for a new data-science building. The University of Pennsylvania received $25 million for the same. The University of Virginia received $120 million, the largest gift in its history, to establish a school of data science. The University of California at Berkeley received $252 million, the largest in its history, to build a data hub dedicated to data science. MIT is investing $1 billion, of which $350 million comes from a founder of Blackstone, an investment company, in a new data-science college.
Data science draws especially from computer science, mathematics, and statistics, but its ambitions sweep across the university. UVa has launched an initiative to hire faculty members with expertise in data science from the natural sciences, social sciences, and humanities. Regardless of their disciplinary home, data scientists have begun investigating literature and culture.
Meanwhile, universities across the U.S. are restricting funding to humanities departments: freezing hiring, increasing reliance on adjunct labor, shrinking grad programs, and even eliminating part or all of entire programs. Big data is, it now seems, coming for your books. Jill Lepore, a prominent public intellectual, Harvard historian, and staff writer for The New Yorker, has taken the threat as the cri de coeur of her latest book, If Then.
Before Cambridge Analytica, there was the Simulmatics Corporation. In the 1960s, Simulmatics pioneered the use of predictive modeling with computers to anticipate how microtypes of voters would respond to political campaigns. Simulmatics also went to Vietnam, where its practitioners attempted to manage the war and failed badly. By 1970, the company was dead. But it had, in the words of If Then’s subtitle, “invented the future.”
For Lepore, the story of Simulmatics is a dark fable. Her characters are vividly drawn. I especially love the story of Eugene Burdick, who “walked with the rubbery gait of a surfer and wore owl’s-eye glasses and smoked a pipe and liked to be photographed sitting at his typewriter, an old Royal, well-used and well-oiled.” Burdick began as a beach boy and Rhodes scholar who was tempted by Simulmatics’ quantification of political science. He ultimately rejected the project, and later became an author of dystopian fiction. He co-wrote two bestsellers, The Ugly American and Fail-Safe, and then devoted a novel to the dangerous world augured by Simulmatics, The 480.
Simulmatics needed a more enthusiastic quant than Burdick, so they turned to Ithiel de Sola Pool. Lepore’s Pool is tenacious. He is a hopeless scientist but a gifted prophet. He is a deluded shill for the Department of Defense during the Vietnam War. Saul Bellow put him in one of his novels. He is the spiritual father of the MIT Media Lab and an emblem of the worst of data science. Pool and Simulmatics embedded masculinist assumptions in their models. To quote Lepore: “By ‘human behavior,’ they meant the behavior of men; by ‘artificial intelligence,’ they meant their own intelligence — a fantasy of their own intelligence — which they intended to graft onto a machine. They did not consider the intelligence of women to be intelligence.” These assumptions caused and continue to cause untold damage. Lepore argues that Simulmatics set us on the path the led to Amazon, Facebook, and Google, by whom we have become “tormented and trapped.” “The automated simulation of human behavior,” she argues, “became the human condition.” In the process humanistic thought has been severely devalued.
But Lepore’s critique of data goes deeper. She dilated on the danger of data science in conversation with Fran Berman at the Harvard Data Science Initiative in September. (She gave a related talk at Emory University in February — titled, “The End of Knowledge: How Data Killed Facts” — to which I was a respondent.) There she argued that we have proceeded through a series of evidentiary paradigms. Facts came to power with, first, the replacement of trials by ordeal with trials by jury in 1215 and were further installed by the Protestant Reformation, both of which dethroned God as the only true knower of guilt and innocence, of what happens in the world. Mystery gave way to knowledge. Numbers displaced facts as the dominant evidentiary paradigm with the rise of capitalism, democracy, and statistics in the late 18th century, tied to the needs of the state and the replacement of deliberation and discernment with measurement. Data began to displace numbers when tabulating machines calculated the 1890 census, but it didn’t fully come to power until the appearance of the UNIVAC and computation in the 1950s.
“The rise of the age of data,” argues Lepore, “is in a way a return to the age of mystery. The machines are the gods, and the computer scientists are the priests, and the rest of us, we just have to look up and hope that they get it right.” This is how data killed facts and ended knowledge. Lepore acknowledges that this is a provocation, one spurred by the financial priorities of universities. “We have a whole cult of data now, where,” she says “whatever you do, if you say it’s data-driven, somehow you can get money for it.”
She’s not wrong about the elevation of machines and computer scientists, and she’s not wrong about the money. But data in itself is neither good nor bad. Lepore errs on the question of data’s place in the production of knowledge, a question it’s crucial we get right if we want to defend the humanities. She offers a division of knowledge with data-driven science set against the beleaguered humanities. The humanities are, it’s true, catastrophically underfunded. We must vociferously argue not just for their preservation but for their expansion. But it’s a mistake to commit to a false binary whereby data belongs to science, not to humanists.
The idea that data and the humanities are a poor fit for each other is new.
Lepore acknowledges that computational methods and predictive models are “very sensible” for the physical and natural sciences. Such work, she notes, has “saved and improved countless lives.” But these methods and models are a poor fit for the humanities, she argues, where “laws like the law of gravity” do not apply. But almost no one who uses data in the humanities imagines they are pursuing the discovery of natural laws for humanity. Neither is the use of data in the humanities a new thing, as we sometimes hear. Actually, scholars have been gathering and relying on data in the humanities almost since the beginning of the modern research university, if not before. Data has long been integrated into the humanities. The idea that data is a poor fit for the humanities is a new idea.
The story of data’s long history in the humanities is told in a new book by Rachel Sagner Buurma and Laura Heffernan, The Teaching Archive, that has quickly become popular among English professors. Buurma and Heffernan argue that we have misapprehended the history of literary studies by neglecting the centrality of pedagogy. The work professors performed in the classroom not only determined what students learned but also, they show, influenced research practices. They teach us about Caroline Spurgeon, a Shakespearean at the outset of the 20th century, a devout believer in humanistic value who embraced quantitative methods by counting and cataloging Shakespeare’s figures of speech — by treating literature, in Spurgeon’s word, as “data.” They introduce us to Edith Rickert, who employed “methods of code analysis” that she’d learned while working for an office of military intelligence during World War I. In the 1920s, Rickert and, separately but in parallel, I. A. Richards, the latter regarded as a founder of close reading, “demanded from students not carefully crafted interpretations of literary texts, but their cooperation in the process of gathering and organizing bits of data about readers and texts.” It was through this data collection that students and teacher “found themselves discussing an astonishing number of complex questions of poetic form and historical context.”
Buurma and Heffernan tell us about Josephine Miles, an English professor at Berkeley from 1939 to 1978, who taught students “to adopt a perspective on facts, rather than simply report them,” which she believed more crucial than ever “in a modern society and a modern research university that regarded individuals as determined by data.” But that position did not deter her from using data. The opposite: Miles conducted groundbreaking research in computational methods. Already in the 1950s she was using computers to build concordances. She went on to build enormous datasets on the history of poetry that allowed her to trace the rise and fall of figures of speech across decades and centuries, to find unknown ruptures and continuities, to use data to determine facts.
This tradition continues today. Lauren F. Klein, an associate professor of English and quantitative theory and methods at Emory University, uses computational methods toward the recovery of absences in the archival record, especially African American absences. She uses Thomas Jefferson’s letters as data to illuminate the life and labor of Jefferson’s enslaved Black chef, James Hemings. She analyzes 19th-century newspapers to reveal the hidden labor of the first Black woman publisher in North America, Mary Ann Shadd, whose work wasn’t recognized in her time.
It is the case that the humanities have long embraced data and, more recently, have benefited from computational methods and predictive modeling. Where does that leave us? Lepore’s diagnosis is on point. Big tech subjugates us and our attention through our data. And data science’s claim “to have triumphed over all other ways of knowing” threatens “the near abandonment of humanistic knowledge.” This is true.
But data is not the enemy. The reduction of knowledge to economic utility is the enemy. Insofar as the vogue for data science among philanthropists and university administrators belongs to a neoliberal ethos of imagining education in terms of return on investment, we should resist it. And many digital humanists center their work on such resistance. Klein, for example, argues for “data feminism,” “a way of thinking about data, both its uses and its limits, that is informed by direct experience, by a commitment to action, and by intersectional feminist thought.” Data feminists critique data science when it “reinforces existing inequalities” and use data science “to challenge and change the distribution of power.”
Let’s recognize that data affords insights about literature and culture that we couldn’t otherwise see, whether about the history of poetics, the gastronomy of James Hemings, or the editorship of Mary Ann Shadd. In April, the journals Cultural Analytics and Post45 published a joint special issue of data-driven scholarship. Its essays demonstrate how Goodreads is reconfiguring how genres work; how internet content has come to shape such television shows as The Good Place and such novels as George Saunders’ Lincoln in the Bardo; how the Iowa Writers’ Workshop inverted an earlier logic of American regionalism such that, now, writers, including John Irving and Marilynne Robinson, move from the metropoles to the provinces, to Iowa, where they set their novels — an inversion the essay’s authors discovered through data.
Literature and culture are increasingly defined by data itself. Think about the presence of authors in the digital literary sphere. Think about Instapoets — popular poets who reach gigantic audiences on Instagram. Or about the arguably perverse proliferation of genres produced through Kindle Direct Publishing. In 2021, to reject data is to risk alienating ourselves from the ontological and sociological grounding of the works we study. Humanists need to embrace data as one of many objects we study — while fighting against those who would turn data against us.
0 Comments