A NEW KIND OF SOCIAL SCIENCE FOR THE 21st CENTURY
A Conversation with Nicholas A. Christakis
These three things—a biological hurricane, computational social science, and the rediscovery of experimentation—are going to change the social sciences in the 21st century. With that change will come, in my judgment, a variety of discoveries and opportunities that offer tremendous prospect for improving the human condition.
It’s one thing to say that the way in which we study our object of inquiry, namely humans, is undergoing profound change, as I think it is. The social sciences are indeed changing. But the next question is: is the object of inquiry also undergoing profound change? It’s not just how we study it that’s changing, which it is. The question is: is the thing itself, our humanity, also changing?
NICHOLAS A. CHRISTAKIS is a Physician and Social Scientist, Harvard University; Coauthor (with James Fowler) of Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives.
Nicholas A. Chrsitakis’s Edge Bio Page
A NEW KIND OF SOCIAL SCIENCE FOR THE 21st CENTURY
[NICHOLAS A. CHRISTAKIS:] In the 20th century, there was a tremendous expectation, or appreciation, for the role that the biological and the physical sciences could play in improving human welfare and human affairs. We had everything from the discovery of nuclear power to plastics to, in biology, the discovery of new drugs, beginning with penicillin (which is one of the gigantic feats of human ingenuity ever). We had this phenomenal progress that was made in the sciences, in the physical and the biological sciences.
In the 21st century, the social sciences offer equal promise for improving human welfare. The advances that we have made and will be making, especially in understanding human behavior and its very deep origins, will be translated into interventions of diverse sorts that will have a much bigger impact in terms of improving human welfare than many of the prior examples that I gave.
This new frontier in the social sciences is being abetted and even accelerated by three things that are happening. The first is that a biological hurricane is approaching the social sciences. Discoveries in biology are calling into question all kinds of ideas, historically important ideas, in the social sciences—everything from the origin of free will, to collective expression and collective behavior, to the deep origins of basic human behaviors. All of these things are being challenged and elevated by discoveries in biology.
For example, as we sequenced the human genome, we did it initially with an eye towards physiologic phenotypes (whether people express certain hormones, or what were the sources of certain variations in risk for diseases like diabetes, etc). Those discoveries are gradually going to be progressively applied to other realms having to do with human behavior.
Incidentally, related to that, it’s not just that a biological hurricane is approaching the social sciences. Social sciences are generating questions that biologists are becoming interested in. One of my favorite examples of this is cooperation. This is a topic that social scientists have been interested in for a very long time, and evolutionary biologists as well. But now this is drilling down even to the cellular or molecular level, and people are beginning to ask questions about how sub-organismic biological entities “cooperate,” and what does it mean for biology?
The second thing that is going to change, or challenge the social sciences, is the era of computational social science, or “big data.” If you had asked social scientists even 20 years ago what powers they dreamed of having, they would have said, “It would be unbelievable if we could have this little tiny Black Hawk helicopter that could be microscopic, fly on top of you, and monitor where you are and who you’re talking to, what you’re buying, what you’re thinking, and if it could do this in real time, all the time, for millions of people, all at the same time. If we could collect all these data, that would be amazing.”
Of course, that’s exactly what we have now. We have, in everyone’s pocket, a little device that functionally does all of the foregoing. We can pool these data, and we can understand human behavior. For me, one of the most interesting aspects of human behavior is collective expression. Not just individual-level behavior, but how do humans aggregate to form collective entities, whether they are thought of as a super-organism, or thought of as communities or groups or networks or nation-states? How do human beings find a collective expression? And we can use all these data to begin to understand human behavior and collective human behavior in a completely new way.
The third thing that’s happening that is going to radically reshape the social sciences—and it intersects with the foregoing two ideas, the biological hurricane and big data, or computational social science—is a newfound appreciation for experimentation in the social sciences. There was always a tradition of doing bona fide experiments in the social sciences, going back well over 100 years, where people would be randomly assigned to different treatments. Psychologists have always been doing this, of course, but other branches of the social sciences are increasingly rediscovering, and more broadly applying, experiments in all kinds of settings: workplaces, schools, hospitals, the developing world, online. People are doing experiments all the time right now, and these experiments offer a robustness of causal inference that is phenomenal.
It intersects with the other two ideas that I mentioned in two ways. First of all, in some sense, the social sciences are aping the natural sciences in the deployment of experiments. The physicists and chemists always did experiments. Incidentally, not all physicists could do experiments. Astronomy, for example, doesn’t afford experiments. Geology doesn’t easily afford experiments. But nevertheless, there’s a sense in which the social sciences are rediscovering the power of experimentation, and in this way, too, reflecting this kind of convergence of the natural and the social sciences.
This newfound appreciation and love of experimentation reflects the second point I made (regarding computational social science) because, with the advent of the Internet and distributed computing, the type of experiments you can do has increased dramatically, and their cost has fallen significantly. For example, in my lab, we create virtual laboratories online, where we recruit volunteers to participate in our experiments from around the world. Sometimes we pay them small amounts, for instance, by using Amazon Mechanical Turk. We can do experiments where we drop people into networks with different structures that we experimentally manipulate, and randomly assign them to live in different kinds of worlds, and then see how these people behave. For example, we can study what happens when they’re randomly assigned to a world in which the network has one mathematical structure, or, instead, randomly assigned to live in a world where the network has a different mathematical structure. That’s just one example of a kind of experiment that we’re doing in my lab, along with James Fowler. But there are many other types of experiments that people are doing, in face-to-face interactions and online.
These three things: the biological hurricane, computational social science, and the rediscovery of experimentation, are going to change the social sciences in the 21st century. With that change will come, in my judgment, a variety of discoveries and opportunities that offer tremendous prospect for improving the human condition.
It’s one thing to say that the way in which we study our object of inquiry, namely humans, is undergoing profound change, as it is. The social sciences are indeed changing. But, the next question is: is the object of inquiry also undergoing profound change? It’s not just how we study it that’s changing, which it is. The question is: is the thing itself changing?
On this, I’m of two minds. I used to think that either things were changing or they were not. Sometimes, when other people were arguing that things were changing, like human nature, I would say, “No, they’re not. Human nature’s always the same.” Other times, when people were saying, “things aren’t changing, society isn’t changing,” I would say, “Yes it is. Violence is declining, for instance,” as Steven Pinker has been arguing (along with others).
Now, I’ve come to the position that everything is changing. And the only thing that varies is the rate at which it changes. Some things change very, very slowly, and some things change very, very fast—and everything in between. It’s a false dichotomy to make any claims about whether something is or is not changing, in fact.
In some ways, you can understand this from a point of view of entropy in the universe, in which there’s constant evolution or change, or (conversely) processes that reduce entropy. You can see biology as a way in which we’re constantly expending energy to reduce entropy.
So, the next point I would make is that a set of important questions can be asked about whether human beings, who are the object of social scientific inquiry, are changing, and over what time scales, and why?
Let me just set the stage with a couple of examples. Since we evolved from our hominid ancestors, it took about 300,000 years to double our life expectancy, till it was approximately 40 years of age. In other words, about 300,000 years ago, we had a life expectancy roughly speaking of about 20 years. About 200 years ago, we had a life expectancy of about 40 years. But in the last 200 years, we’ve doubled it again. It was a change that took 300,000 years in the first instance, which might have been almost imperceptible. If you had asked me a thousand years ago, “Is human life expectancy changing?” I might have said no. So, a change that took 300,000 years to occur in the first instance, in the second instance, takes 200 years. Life expectancy is indeed increasing dramatically, at least over this interval of time.
Another example of this tension between whether things are changing or not changing is the debate about whether or not human beings can evolve in historical time, under the pressure of historical forces. I used to think that this was not possible. But there’s been a huge amount of work by many labs around the country in the last 10 years or so documenting that we humans are evolving in real time. The famous, best example of this is the evolution of lactase persistence into adulthood. The ability to digest lactose, which is a sugar in milk, isn’t really of any value in adulthood until you have a stable source of milk. It turns out that human beings have independently evolved this capacity to digest milk as adults a half-dozen times, in different settings around the world, coincident with the cultural innovation of domesticating animals—domesticating sheep, goats, or cows, which provides a ready supply of milk. This milk is a good food source in times of scarcity. It’s also a good source of unspoiled hydration. So this confers survival advantages.
Here we have a cultural product—namely, the discovery or the invention of domestication of animals—which feeds back to, and creates, a kind of selection pressure on us as a species, so that here, thousands of years later, most of us are able to digest milk in adulthood as a result of this cultural product.
So, a conversation is taking place between our behavior and our culture, on the one hand, and our biology on the other. But rather than it being the biology which guides or dictates the culture or the behavior, it’s the culture or the behavior which guides or dictates the biology. We domesticate animals, and this gets internalized down at the level of our genes. We change as a species as a result. The amazing thing is that there have been, as I said, about half-dozen of these separate mutations in the relevant portion of our genome, which is responsible for the persistence of lactase into adulthood, in independent locations in various places around the world, principally in Africa, over the last three to nine thousand years.
There are other similar points. For example: we invented cities. First we invented agriculture 10,000 years ago. Then, as a result (though this is debated), this fed into our capacity to invent cities. Basically, cities became possible in various ways. But one can ask the question: what does it mean for us, not culturally, but biologically, that we now live in cities, and have for thousands of years? Does that present certain cognitive challenges? Do the kind of brains we have reflect the fact that we moved from being a hunter-gatherer population to a kind of urban population, a metropolitan population even? That’s a second example of ways in which the object of inquiry—namely, humans and human behavior—is changing. Not over a 300,000 year time span, but over a millennium span.
If you had asked me a few years ago, “does our DNA vary for the purposes of social science?” I would have said, “No, it’s not changing.” But I don’t feel that way anymore. I think it is changing.
Now the question is, is the Internet something like that? Is it the case that modern telecommunications, which have demonstrably exploded—certainly since the invention of the telephone and certainly the Internet—is this another inflection point? The printing press, telegraphy/telephony, and the Internet: these would be three possible inflection points.
With telegraphy, you finally have the ability to transmit messages at a speed faster than a human can travel. Up until then, if you wanted to get a message from Point A to Point B, a human had to carry it. You had signal fires and certain other rudimentary technologies, but basically a walking human, a horse-riding human, or a boat with a human, was required to move a message. With the invention of telegraphy and telephony, messages could move faster than a human; and with the invention of the Internet, it’s another whole order of magnitude, not so much in speed, although that too, but in volume and breadth and searchability, and all the other well-known things about the internet (such as connectivity).
Does the Internet represent something like that? I would have said, even a year or two ago—and in fact, I did argue—that the Internet is not changing our minds. I’m not so sure anymore. It’s possible that, for better or for worse, this kind of technology is affecting us, albeit slowly. I’m making the argument not so much that it’s affecting our biology, although it might be, but rather, it’s affecting fundamental aspects of human organization and human behavior.
We can see this in everything from the way we teach our kids, with attention-deficit issues emerging amongst children, or adults who are distracted constantly, to the way in which we don’t have to remember things so much anymore because we have Google in our pocket, for example. The way in which we interact with each other—certain social niceties—can be replaced by a social computational device. What would it mean to be in a world in which we could wear glasses that bring up data, can recognize your face and then bring up your Wikipedia page, so I can walk through the world and no longer have to bother to remember whether people are friend or enemy? Something that was crucial for tens of thousands of years in hominids, knowing who someone was and whether they meant you ill or well, now could be delegated to a machine, or embedded in my glasses. We’ll have those glasses certainly in 100 years. Probably in 10 years, we’ll have things like that.
What does that mean for social interaction, social organization, and social behavior? It has radical implications, which is something I wouldn’t necessarily even have said just a couple of years ago, actually. My thinking on this is evolving.
In fact, part of my time recently has been devoted, in collaboration with others, to studying hunter-gatherer societies, partly with the objective to search for things that are (relatively) invariant. To the extent that hunter-gatherers show certain behaviors and we also do, it means there’s something very deep and fundamental in our humanity. Despite the fact that that there are such fundamental things, I also think that the object of our inquiry is changing, after all.
My own work is at the intersection of the natural and the social sciences. I’ve been collaborating with James Fowler for 10 years now on a body of research. I do some things independent of James, and he does some things independent of me, but most of our best work, I think, is done jointly.
The focus of my lab, over the last few years and in the coming years, is on a few main areas. One area is on the deep biological origins of a diverse set of social phenomena. In particular, I’m interested in the biological origins of social order. I’m interested in the extent to which we are the way we are, socially and behaviorally, because of biological predicates.
We have a body of work, James and I and some of our collaborators, looking at diverse aspects of this, beginning initially with an effort to understand the biological origins of social network interactions. It’s a very interesting question to ask: Why do we humans have friends? It’s not hard to understand why we have mates. It’s not hard to understand why we seek out others with whom to have sex. It’s quite another to explain why do we have friends? We’re very unusual as a species in doing this. Other species, generally speaking, don’t do this; they don’t form long-term, non-reproductive unions to other members of their species.
Not only do we have friends, but we have friends in very particular ways, it turns out. As a result of this, we form networks, social networks, with very particular structures. James and I have been engaged in a project, and will continue to be working in this area over the coming five years, trying to understand the biological origins of human sociality and human network structure and function. Why do networks have the structure that they do, and why do networks perform the functions that they do, for us as a species? Hence, the first big issue that we are engaged in is the biological origin of social order, and this is focused, in particular, or at least initially, on networks.
The second big topic has to do with addressing what we call the “so what” question. The question is: so what if we can understand human social networks, or so what if we can understand human behavior? What can we do with this knowledge to make the world better? Are there ways in which we can imagine improving the world through a better understanding of its social reality, not just its biological and physical reality?
James and I have a set of ideas about this as well. We are doing large, randomized controlled trials around the world, like in Uganda and Honduras. We hope to soon begin one, with some support from the Gates Foundation, in India, where we are trying to see whether a deeper understanding of human interaction can facilitate pro-social change in these communities. Can we target things like bednets for malaria, or water purification devices, or processes related to maternal and child health? Can we figure out better ways in which, by taking advantage of people’s natural behaviors, we can intervene at the village-wide level, at the collective level, to improve economic development and public health?
For example, let’s say you have two different villages, and you map their networks, and you could give 10 percent of people at random in this village an intervention, and you hope that the whole village would ultimately adopt the intervention, and there would be a diffusion of the intervention. Or in this (other) village, instead of giving 10 percent at random, you pick the people strategically, taking into account the structure of the social network, and, let’s say, also a deeper understanding of their behavior, not just the structure of the networks, but also a deep understanding of behavioral economics, for example, or psychology. Now you target the people. Can you convert this village to do things in a better way, to be healthier, to be richer, based on this deeper understanding? We’re doing randomized trials to do this. Hence, the second big project we are engaged in is large, randomized controlled trials of interventions in the developing world.
The third main thrust is melding some of the ideas from computational social science and the new experimentation in social science that I alluded to earlier. We’re creating virtual laboratories where we’re recruiting, in many cases, thousands of research subjects who come to this virtual lab. We do these social science experiments with them where we can create all kinds of manipulate-able environments, virtual environments, in which real people come and engage in real behaviors, and then we can monitor this, and it’s as if we could artificially create whole groups, whole fantasy cities, and then we could observe people in these experimental ways. These experiments are not just ones done in my lab, because there are many other labs around the country using similar technology; but our experiments are, not surprisingly, social network experiments. I’ll give you two examples.
In one experiment, in collaboration with David Rand and Sam Arbesman, we wanted to understand to what extent can we preserve humans’ natural tendency to cooperate? I say “natural tendency to cooperate.” There are a lot of deep questions that can be asked about why we cooperate at all, which is also at the intersection of the natural and social sciences. But let’s, for the moment, accept that people have a “natural tendency to cooperate.”
When you put people together, very quickly defection takes over. People say, “why should I cooperate with this guy? He’s not cooperating with me. He’s taking advantage of me. I’m going to stop cooperating.” The other guy reacts the same way, and, before you know it, if you start with a group in which 65 percent of the people are inclined to cooperate, after the passage of some time, everyone has given up, everyone’s defecting, no one is working together.
The question is, how can we, if at all, engineer a set of social interactions which keeps cooperation alive, which preserves it? We did an experiment in which we recruited Amazon Mechanical Turk workers—these are people around the world that are paid small amounts of money to participate and do various small tasks. In our case, we paid them a few dollars and they participated in our experiment for about an hour. They came to our virtual laboratory, and they were dropped into virtual worlds in which we controlled the nature of the interactions.
In one world, for example, people were dropped into a network that had random wiring between the individuals, and we observed them across time. In the beginning, we saw that 65 percent of them cooperated with their neighbors. But, they couldn’t control who their neighbors were, and they found out that some of their neighbors were defectors and weren’t cooperating back, and so after each round of the game, after the passage of some number of rounds, pretty much everyone had given up and cooperation was extinguished in the system. This result had been widely described by others, and has been studied a lot. At least empirically, cooperation declines in these types of fixed lattices, or fixed networks.
In other variants of the experiment, however, we allowed people to re-wire their networks. At every time step, they could cut the ties to people who were abusing them, and preferentially form ties to other people who were cooperators. And so, they could rewire their social world. In this variant of the experiment, after the passage of some time, cooperation persisted. In a world in which we allow people to form and re-form their social ties, cooperation can be sustained.
What this means is that there’s a very deep relationship between social network structure and function and the maintenance of this very key human behavior, namely cooperation. This is an experiment in which we were able to isolate one of the deep sources of human cooperation, which is the ability to preferentially form ties with certain others in our social orbit.
In another experiment, we explored how different network architectures affect the propagation of different kinds of ideas and behaviors in these social systems. Imagine in your mind’s eye a world in which people live in a network that looks like a salt crystal, a regular lattice. Or, imagine in your mind’s eye a network in which people live, and that network looks like the roadmap of the United States, where every city is a person and the roads represent connections between the people. On the one hand, we have a kind of salt-crystal regular lattice. On the other hand, we have a network that looks like a jumbled tangle of string. You could live in either of those two worlds, and the question is, what would it mean for you to live in such a social world?
We can create artificial worlds like that and bring people into them. They only observe their narrow interactions. They don’t see the whole world. Then we can observe what they do, given the fact that they are arranged in this higher order structure. What we’re doing is we’re trying to understand how we get from individual behavior to collective behavior, and how we get from collective behavior to individual behavior. We’re doing experiments to bridge that gap.
In this era of computational social science, there are whole hosts of technical, methodological questions that need to be addressed. Some of them have to do with the nature of the data, humdrum scientific issues like accuracy of measurement and reliability of measurement. For example, if you’re recruiting people online, does their behavior online resemble the behavior that they would exhibit offline? Or, are the people that you recruit online some kind of a plausible random sample of human beings, or is there something different about Amazon Mechanical Turk workers, and their behavior, compared to people in general, for example? Or, what is the meaning of measuring certain things in certain ways, compared to other ways? What kind of ruler are we using, and so forth? These are well-understood topics in the sciences, and there’s a bunch of labs, my lab included, that are doing a variety of calibration experiments to calibrate our instruments and make sure that they are reliable.
A second set of issues comes up with respect to the ethics of conducting these experiments. People are often concerned about privacy and confidentiality and things like that. I need to be very clear that I think that privacy and confidentiality are incredibly important. But I also can’t help but notice that the man on the street seems to be much more concerned about privacy and confidentiality than he is about whether he lives or dies if he was a subject of a randomized controlled trial of a chemotherapeutic agent. I think the reason is that the level of technical knowledge to have an opinion about whether privacy is important is relatively low, compared to the level of technical knowledge that’s required to have an opinion about whether a particular randomized controlled trial of a chemotherapeutic agent is ethical or not.
In other words, you can readily get through institutional review boards research proposals which propose to take 100 people with cancer and randomize half of them to get a drug and half not, in which the risk of a mistake leads to the death of the subject. And yet, if you propose to do something in which you ask people about their sexual behavior, everyone gets up in arms. Even though there’s really a trivial risk in that situation. Maybe someone will feel bad about having talked about their sexual behavior, or perhaps the data will in some way not have been properly anonymized, and there’ll be a leak of information about someone’s sexual behavior. I’m not trivializing that risk, but I am benchmarking that risk against the alternative kind of research, in which the stakes are life or death, or the loss of a limb or something.
You hear a lot of conversation in the public sphere about the ethical conduct of this research, and we should. But I can’t help but notice that there is a little less conversation about the ethical conduct of all kinds of other research, for which the stakes are actually much higher.
Incidentally, another thing that’s fascinating to me is that, there’s a very funny saying when it comes to the ethical review of science, or an anecdote, which is that if a doctor wakes up in the morning and decides that, for the next 100 patients with cancer that he or she sees that have this condition, he’s going to treat them all with this new drug because he thinks that drug works, he can do that. He doesn’t need to get anyone’s permission. He can use any drug “off-label” he wants when, in his judgment, it is helpful to the patient. He’ll talk to the patient. He needs to get the patient’s consent. He can’t administer the drug without the patient knowing. But, he can say to the patient, “I recommend that you do this,” and he can make this recommendation to every one of the next 100 patients he sees.
If, on the other hand, the doctor is more humble, and more judicious, and says “you know, I’m not sure that this drug works, I’m going to only give it to half of the next 100 patients I see,” then he needs to get IRB approval, because that’s research. So even though he’s giving it to fewer patients, now there’s more review.
What’s astonishing to me is that, every day, around the country, all kinds of things are being done in the name of science. Whether it’s polluting the environment or inventing new chemicals, or genetically-modified foods, or administering all kinds of drugs. Even outside of research, in my anecdotal experience, there is the same kind of attention to issues of privacy and confidentiality (rather than to even more important things). The reason is that the level of technical know-how required to form opinions about those topics is so much higher than the level of technical knowledge required to form an opinion about privacy and confidentiality.
Who are the kindred spirits in all this work at the intersection of the natural and social sciences? One man whose work I admire a lot is Brian Uzzi at the Kellogg School at Northwestern. Brian has done some of the seminal work on networks and on scientific collaboration.
Now, recently a paper that James and I did with Coren Apicella and Frank Marlowe appeared in Nature; it mapped the social networks of the Hadza hunter-gatherers. We had a sample of 200 people. There are only about a thousand Hadza left on the planet. They live in a very traditional way. They sleep under the stars. They don’t build any dwellings. They have very few possessions. They hunt and they gather for their food. They’re a pre-agricultural and natural fertility population.
Because we were interested in the kind of deep evolutionary origin of human social networks, we were animated by this question: if there’s a biological origin for human social networks, and we’ve been making networks of a similar kind since we were very ancient—that is to say, for tens of thousands of years—it should be the case that Hadza social networks look the same as ours.
Conversely, if the structure of human social networks depended upon modern telecommunications, or the invention of cities, the network should look very different. We hit upon this idea of mapping the social networks of a hunter-gatherer population, which I don’t think had ever been done before. Coren drove around 4,000 square kilometers around Lake Eyasi in Tanzania, and we created a kind of Facebook for the Hadza, a series of posters that had a photographic census of every adult Hadza. And every Hadza we could find, we asked them who their social connections were, and we mapped the networks of the Hadza. This paper was just in Nature a few months ago.
What we found was that Hadza social networks look just like ours. In every kind of way we could study these networks, mathematically, they didn’t differ from ours. Our sample size in that project was 205 respondents. Which is the majority of adults who still live in the traditional way of the Hadza on the planet. Incidentally, the Hadza have a click language, and we think they are one of the oldest populations with one of the oldest lifestyles on the planet.
We published this paper, and then Brian sends us an e-mail—which is still one of my favorite e-mails I’ve gotten as a scientist—and he goes, “Kudos.” He goes, “While everyone else is chasing big data, you go in the other direction and chase small data.” Just 205 people, and yet, I think, from those 205 people, we were able to extract some insights that were not trivial.
I don’t know what it was like to be a scientist in the 1950s or in the 1800s. But I interact with people from everything from computational biology to physics to applied math to evolutionary biology to psychology to sociology to medicine to political science to economics. The variety of people whose ideas and disciplines intersect with my ideas and disciplines is very broad.
Part of that reflects my own joint training between the natural and the social sciences. I’m both a physician and a social scientist. But, partly, that reflects our topics of inquiry—the kinds of things that the other people and I are also investigating. But I also think it may reflect 21stcentury science. As Brian Uzzi has also been looking at, and showing, there’s a way in which science is changing. It is becoming more interdisciplinary. It’s becoming more collaborative. Some of Brian’s work actually looks, using citations as a marker of quality—which is as good a marker as we have, but you can also take issue with it, but let’s just say it’s good enough—he looks at the nature of scientific collaborative groups, how they interact with each other, and how big they are, and he finds that, over the last 50 years, work has become increasingly more collaborative, and the work that is more collaborative is of higher quality, at least as marked by citation.
The world in which I live, scientifically, is very interdisciplinary, partly because of who I am, partly because of my interests, and partly because of science in general.
So, we talked about the reliability and validity of data, and that’s a standard issues in science; can you measure these things when you use online data, for example? We also talked about issues of ethics, of privacy and confidentiality, and other related issues of subject protection. It’s also important to begin to think about issues of data sharing. What does it mean that, in this era, many of the custodians of big data are private enterprises? One extreme position might say, “well, unless these enterprises are willing to give their data to everybody, we can’t do any such research.” That’s clearly a ridiculous position. It’s wrongheaded and unrealistic. At the other extreme, we would imagine a model in which only people that had the data could actually do research. That’s already happening. Believe me, credit card companies and banks and Google and Facebook and Zynga, they analyze their data every day for commercial purposes, to understand how they can improve their business and make more money. In between those two extremes are models in which data is shared between the holders of the data and the scientists, and people wishing to apply the data.
There’s a big battle taking place right now, for the balance of power. In the olden days, data was cheap and analytic skills were dear. The scientists were ascendant, and the people with the skills to analyze data were the ones that had the power. But there may be a shift taking place now where the custodians of the data have a lot more power. And there may be many people who can analyze the data. Now, what’s valuable is the data itself. I don’t know where that’s going to fall out. Both are required to do good research.
The position that somehow, if you don’t release your data when you publish a paper, otherwise a paper can’t be published, is also wrongheaded and naïve. It was never the case that any scientist always released everything. When Marie Curie discovered radium, and radium was available in tiny amounts, she didn’t just give it away to everybody else, initially. Or when genes were sequenced, or new scarce animals were cloned, the scientists who did this work described what they did, but they didn’t give their initial samples away. Science is replicable, but part of the replication involves getting your own data. It’s not easy to get data, either to collect the data or to work with firms to acquire the data, to secure the data. I think this is part of the art of doing science.
When you do science, you need data, you need analytic ability, you need ideas. All of these things are required. I don’t think it’s appropriate to somehow say that data is just a commodity and it should just be given away. Part of the intellectual contribution and the step forward is figuring out how to get the data. So long as you describe what you did, and you describe how you got the data and what kind of data you have, in principle, other scientists could do the same thing.
In my particular lab, generally speaking, we widely share data. For instance, our Framingham Heart Study social network dataset was posted online; it’s subject to certain restrictions that NIH imposed, but people can get that data. We’ve made publicly available many, many data sets, and we’ve also benefited from the kindness of others. We’ve gotten access to data from commercial firms, from other labs. There’s a kind of widespread—actually cooperative, collaborative—system in place. At least in my judgment, so far, it’s working.