Co-directors of newly launched Harvard Data Science Initiative discuss new era

“DOMINICI: Because of the new advances in technology, almost every field right now has data, and more data than ever. Clearly, there’s the explosion of genetics and genomics data in the life sciences, in molecular data, as well as astronomy and economics. Even in the humanities, you can scan documents and turn it into data that you can analyze.

PARKES: To add some numbers to this, IBM has estimated that we’re generating more than one quintillion bytes of data a day. (A quintillion is a 10 to the 18th.)

DOMINICI: One of the reasons we are so excited that Harvard is launching the Data Science Initiative is because of all the advances our faculty have made in recent years. We can now describe the entire genome, define the exposome (the environmental analogue to the genome), characterize social interactions and mood via cellphone data, and can digitize historical data relevant for the humanities. ….

DOMINICI: We have launched the Harvard Data Science Postdoctoral Fellowship, which is among the largest programs of its kind, and we want to recruit talented individuals in a highly interdisciplinary ways.

We have also launched a competitive research fund that will catalyze small research projects around the University. Through our friends in the Faculty of Arts and Sciences and the Medical School, we’ve identified some spaces in the near term where people can get together. …

PARKES: We are launching the initiative because we want to get to a point where we have a Harvard Data Science Institute. The aspiration is that the Data Science Institute will have some physical space associated with it,

“A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.””

“If public-opinion polling is the child of a strained marriage between the press and the academy, data science is the child of a rocky marriage between the academy and Silicon Valley. The term “data science” was coined in 1960, one year after the Democratic National Committee hired Simulmatics Corporation, a company founded by Ithiel de Sola Pool, a political scientist from M.I.T., to provide strategic analysis in advance of the upcoming Presidential election. Pool and his team collected punch cards from pollsters who had archived more than sixty polls from the elections of 1952, 1954, 1956, 1958, and 1960, representing more than a hundred thousand interviews, and fed them into a UNIVAC. They then sorted voters into four hundred and eighty possible types (for example, “Eastern, metropolitan,
lower-income, white, Catholic, female Democrat”) and sorted issues into fifty-two clusters (for example, foreign aid). Simulmatics’ first task, completed just before the Democratic National Convention, was a study of “the Negro vote in the North.” Its report, which is thought to have influenced the civil-rights paragraphs added to the Party’s platform, concluded that between 1954 and 1956 “a small but
significant shift to the Republicans occurred among Northern Negroes, which cost the Democrats about 1 per cent of the total votes in 8 key states.” After the nominating convention, the D.N.C. commissioned Simulmatics to prepare three more reports, including one that involved running simulations about different ways in which Kennedy might discuss his Catholicism.”

Solomon leaned back in his chair and flipped through a mental Rolodex of his clients. “I definitely have some ideas,” he said, after a minute. “The first person who comes to mind, he’s also a
bioinformatician.” He rattled off a dazzling list of accomplishments: the developer does work for the Scripps Research Institute, in La Jolla, where he is attempting to attack complicated biological problems using crowdsourcing, and had created Twitter tools capable of influencing elections. Solomon thought that he might be interested in AuthorBee’s use of Twitter. “He knows the Twitter A.P.I. in his sleep.”

And, like actual rock stars, rock-star developers come in a range of personality types. Guvench had briefed me at the coffee shop: front-end guys—designers and user-interface engineers—make products that interact with what he referred to as “normal” people. As a result, “they’re sort of hip,” he said. “Especially designers—they dress nicely.” The further you get down the “stack,” Guvench explained, “the more . . .” He paused. “ ‘Neckbeard’ is the word that comes to mind.” Back-end engineers, like data scientists and system administrators, “are the most brilliant people,” he said. “They may not be the most fun to talk to at a party, but they’re really fucking good at talking to computers.” Of course, he added, the stereotype doesn’t apply to his clients.

