Thursday, July 10, 2008

The End of Theory II has a wonderful symposium on reactions to Chris Anderson's Wired article on The End of Theory.. What strikes me from reading the symposium is the lack of regard for inductive methodologies as "science." The presumption is that, what Richard Fenno called, soaking and poking, is something new in the world of science. Traditionally in my discipline, it has always been thought of as a prelude to the real work of hypothesis testing.

What strikes me as fascinating is the ability of "computing in the cloud" to hyper-soak and poke. Kevin Kelly uses some interesting examples from Google about this potential.
It may turn out that tremendously large volumes of data are sufficient to skip the theory part in order to make a predicted observation. Google was one of the first to notice this. For instance, take Google's spell checker. When you misspell a word when googling, Google suggests the proper spelling. How does it know this? How does it predict the correctly spelled word? It is not because it has a theory of good spelling, or has mastered spelling rules. In fact Google knows nothing about spelling rules at all.

Instead Google operates a very large dataset of observations which show that for any given spelling of a word, x number of people say "yes" when asked if they meant to spell word "y. " Google's spelling engine consists entirely of these datapoints, rather than any notion of what correct English spelling is. That is why the same system can correct spelling in any language.

In fact, Google uses the same philosophy of learning via massive data for their translation programs. They can translate from English to French, or German to Chinese by matching up huge datasets of humanly translated material. For instance, Google trained their French/English translation engine by feeding it Canadian documents which are often released in both English and French versions. The Googlers have no theory of language, especially of French, no AI translator. Instead they have zillions of datapoints which in aggregate link "this to that" from one language to another.

Once you have such a translation system tweaked, it can translate from any language to another. And the translation is pretty good. Not expert level, but enough to give you the gist. You can take a Chinese web page and at least get a sense of what it means in English. Yet, as Peter Norvig, head of research at Google, once boasted to me, "Not one person who worked on the Chinese translator spoke Chinese. " There was no theory of Chinese, no understanding. Just data. (If anyone ever wanted a disproof of Searle's riddle of the Chinese Room, here it is. )
This is no doubt true when it comes to Social Science where we are notoriously dreadful at prediction. It is not so true for meaning making, science's other core purpose. Here's Bruce Sterling's amusing rejoinder to Kelly's observations which seem to correctly mock the view that theory will become obsolete.
Surely there are other low-hanging fruit that petabytes could fruitfully harvest before aspiring to the remote, frail, towering limbs of science. (Another metaphor—I'm rolling here. )

For instance: political ideology. Everyone knows that ideology is closely akin to advertising. So why don't we have zillionics establish our political beliefs, based on some large-scale, statistically verifiable associations with other phenomena, like, say, our skin color or the place of our birth?

The practice of law. Why argue cases logically, attempting to determine the facts, guilt or innocence? Just drop the entire legal load of all known casework into the petabyte hopper, and let algorithms sift out the results of the trial. Then we can "hang all the lawyers, " as Shakespeare said. (Not a metaphor. )

Love and marriage. I can't understand why people still insist on marrying childhood playmates when a swift petabyte search of billions of potential mates worldwide is demonstrably cheaper and more effective.

Investment. Quanting the stock market has got to be job one for petabyte tech. No human being knows how the market moves—it's all "triple witching hour, " it's mere, low, dirty superstition. Yet surely petabyte owners can mechanically out-guess the (only apparent) chaos of the markets, becoming ultra-super-moguls. Then they simply buy all of science and do whatever they like with it. The skeptics won't be laughing then.

No comments: