Saturday, July 05, 2008

Science is dead! Long live Science!

Or so claims Chris Anderson, editor-in-chief at WIRED Magazine. In a piece entitled The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, Anderson argues that hypothesis testing and scientific models are going extinct and, in this age of ever-increasing computing power, massive amounts of data are everything. In his own words:
Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise. But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.
The argument, it seems, is one of induction on steroids: with so many data points, creating a model isn't necessary because the data are predictive with computational pattern finding, statistical analyses. "With enough data, the numbers," he writes, "speak for themselves."

To illustrate this point, Anderson uses Google as an example. Google's algorithm doesn't care why one page is higher ranked than another, all that matters is that the math says it is. This, of course, is a red herring. Google's algorithm is the model and their continued dominance in search is the successful test. As a technology company, they probably don't care what makes a page relevant, as long as their model continues to reflect what people are looking for.

A more relevant example used in the article is Craig Venter. Lamenting that our knowledge of biology and biochemistry is becoming too complex to be able to model and predict, Anderson points to Venter's ocean and air sequencing projects as an example of science without hypotheses.
If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.

This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It's just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.
There's no doubt that Venter has made some major contributions to biology (and while he's pretty badass, I don't know if anybody considers him the greatest scientist who ever lived, as some do Darwin), but is his work really done without the scientific method? Of course not. Whatever genomes he sequences will be aligned and annotated, gene functions will be hypothesized all based on current theory. The massive amounts of data can be used to test evolutionary hypotheses; they can be used to generate hypotheses.

Imagine in the future, with all this wealth of personal genomic information available, somebody did the kind of pattern finding and statistical analyses the WIRED piece suggests, and finds a novel mutation that is the cause of some disease. This tells us nothing about human biology or the etiolgy of the disease. This doesn't suggest intervention or treatment. Without the scientific method - hypothesis forming and testing - this is little more than trivia.

Science is a way of knowing; a way of exploring our world and learning about it. It's the way we test ideas, answer questions and advance technologies. Chris Anderson seems to see it as bookkeeping; simply cataloguing observations. He's certainly correct that the 'Petabyte Age' offers "huge amounts of data, along with the statistical tools to crunch these numbers" and this will undoubtedly be a powerful tool and invaluable resource. To say it marks the end of the scientific method is absurd. If anything, the vastness of data will provide new observations and new ideas, which is the beginning of the process, not the end. The rumours of its death have been greatly exaggerated.

UPDATE: Good Math, Bad Math writes about the WIRED piece and large scale data analysis.


5 comments:

Chris said...

hey, thanks for putting that into words. i'm not entirely sure what andersson was trying to say, maybe that computers and statistics are becoming indispensible in science. but at a time when scientific models are becoming more and more central to our general understanding of the world, especially in fields like neuroscience, his piece ended up sounding absurd. maybe he just wanted to be provocative.

Anonymous Coward said...

I think it's just meant to be provocative. But I see his point. Systems biology is often not hypothesis driven. It also generates mountains of data. I guess it's becoming like physics, where the large machines produce ridiculous amount of data and we look at it, and extract a couple of conclusions.

Kamel said...

Unfortunately his point is obfuscated by the 'scientific method is obsolete' stuff. Like I said, I think he's confusing a tool with a method.

A systems biology approach may not begin with a hypothesis (is this true?), but it is certainly about building predictive, testable models. Even if it's not hypothesis driven it's hypothesis generating.

Anderson uses the physics example as well. The Good Math, Bad Math article linked above does a much better job discussing extracting conclusions from a large amount of data (and how it still depends on the scientific method) than I could, but I will steal this Darwin quote from one of the commentors there: "About thirty years ago there was much talk that geologists ought only to observe and not theorize; and I well remember someone saying that at this rate a man might as well go into a gravel-pit and count the pebbles and describe the colours. How odd it is that anyone should not see that all observation must be for or against some view if it is to be of any service!"

Anyhow, Anderson has a point to be made about how experiments are changing but I still think he's well off-base in claiming the scientific method is on its death bed.

The Key Question said...

To me, the most bizarre thing about this article is how Chris Anderson appears to be looking forward to this glorious new age of theory-free science.

He doesn't seem to realize that the end of hypothesis-driven research, if it were true, will in fact spell the demise of the entire scientific endeavour.

If you want chickens in the future, it doesn't really matter whether theory is the "chicken" or the "egg".

For data to have any meaning, you must have theory. For theories to have reflect reality you must have data. They are mutually important to each other.

At the heart of the scientific endeavour is the functional development of theory. If you remove that human component of ingenuity, creativity and serendipity and degrade the predictive and narrative power of science to mere correlation-finding of huge datasets by number-crunching machines - I can guarantee you that no human being will be interested in doing science anymore.

Potential scientists would be defecting by the trunkloads to any other field that offers more space for creativity, say... Administration.

The Doc said...

Personally, I view data-mining and the range of 'omics' subjects as powerful observation tools. We're taught to start our scientific method with a Popperian Observation, and follow it with a hypothesis.

These techniques are very powerful tools for providing these sorts of observations. To write a paper where you compare two sequences is all good and well, but you would then have to make a hypothesis, a predictive statement, then test it to make it into science.

And I don't think that the Omics are taking over the role of science at all. I think they are serving the purpose of observation very well, and that they have spawned a number of grants and proposals for good science which can then be followed up.