I have recently become obsessed with analytics. I just love the idea of using solid data to make informed choices toward action. It’s the ultimate voyeurism. After all, the internet is a window through which you can peer to monitor other people’s activity. It’s also seductive, instant gratification- I post a document and then check in just an hour later to see how many people have clicked on it, how long they spent reviewing it, where they went after they read it, where they came from before reading it. ..
The power that platforms like Google Analytics and Omniture offer excites me in ways I shouldn’t even publicize- the possibility that all of that information about online actions and behavior is at my fingertips to exploit in order to be more productive, more effective is intoxicating. This is probably why it’s a good thing that I don’t work in marketing or advertising.
But apparently the harvest, process of sorting, and the exploitation of human information no longer stops with marketers and advertisers- now the government wants in.
According to an article in yesterday’s NY Times, “social scientists are trying to mine the vast resources of the Internet — Web searches and Twitter messages, Facebook and blog posts, the digital location trails generated by billions of cellphones” to predict the future. This is being conducted all in the name of the U.S. Government, or in this case, the Intelligence Advanced Research Projects Activity unit of the Office of National Intelligence.
Why? Because they believe “that these storehouses of ‘big data’will for the first time reveal sociological laws of human behavior — enabling them to predict political crises, revolutions and other forms of social and economic instability, just as physicists and chemists can predict natural phenomena.”
Remember our dear friend Michel Foucault who opined on systems of surveillance in modern society? He just rolled over so many times in his grave he’s now a taquito. But putting the panopticon aside for a moment, let us instead turn to “chaos theory” to underline why this whole venture isn’t necessarily a very good idea.
Chaos theory, as a discipline, studies:
The “butterfly effect theory” is basically this:
Small differences in initial conditions (such as those due to rounding errors in numerical computation) yield widely diverging outcomes for chaotic systems, rendering long-term prediction impossible in general. This happens even though these systems are deterministic, meaning that their future behavior is fully determined by their initial conditions, with no random elements involved. In other words, the deterministic nature of these systems does not make them predictable.
Yes, if this is ringing a bell, it’s because you’ve heard of the anecdote the theory is named for, whereby a hurricane’s formation occurred because a distant butterfly had flapped its wings several weeks before. Ridiculous, but it does vividly illustrate the point that the entire globe is a system, and there are infinite factors within that system interacting every day to produce outcomes- and needless to say, these factors are not all diligently recorded in Brooke Shields’ Twitter stream.
Ever since analytics, Facebook, and Twitter broke onto the human information scene, the embedded hubris of men has convinced us that if we’re just smart enough to design a program to parse all of this information, then finally all of our inane yet determined recordings of our daily details will finally mean something– that it will be useful!
The mashed potatoes are just mashed potatoes. If you want to see anything in the figurative mashed potatoes, then see this: the Tower of Babel, people.
“Tower of Babel?” you say? Yes. The Tower of Babel. My favorite of all biblical references ( we all have one, right? Right?).
Need a quick brush-up? OK!
In the story of the Tower of Babel, from Genesis, ‘a united humanity of the generations following the Great Flood, speaking a single language and migrating from the east, came to the land of Shinar, where they resolved to build a city with a tower “with its top in the heavens…lest we be scattered abroad upon the face of the Earth.’ God came down to see what they did and said: ‘They are one people and have one language, and nothing will be withholden from them which they purpose to do.’ So God said, ‘Come, let us go down and confound their speech.’ And so God scattered them upon the face of the Earth, and confused their languages, and they left off building the city, which was called Babel ‘because God there confounded the language of all the Earth.’(Genesis 11:5-8).
In other words, chaos theory’s conclusion that all of the world’s data is basically worthless, unreliable crap aside- this “big data eye in the sky” can and will never be.
First, because, without God’s intervention, we are perfectly great at getting in our own way, thankyouverymuch.
For example, the NY Times article cites IARPA’s claim that “It will use publicly accessible data, including Web search queries, blog entries, Internet traffic flow, financial market indicators, traffic webcams and changes in Wikipedia entries.”
About that, the U.S. Government would do well to recall the response to every single privacy change that Facebook has ever made about user data.
Also, the public’s responses to the Patriot Act.
Also, the public response to News Corp’s recent phone hacking scandal.
I could go on. The point is, I don’t think folks will accept the government’s efforts to exploit the aggregation of their online and publicly collected information in order to predict when we might all come down with whooping cough.
Second problematic claim, “It is intended to be an entirely automated system, a “data eye in the sky” without human intervention.” Errrr…what about all of that human generated information? Isn’t that, um, human intervention?
I recently had the absolute pleasure of hearing Stephen J. Dubner- author of Freakonomics and creator or host of every other program, show, or book that came along with it- speak at a conference. He gave an excellent and very compelling lecture on the dangers of relying too much on “self-reported data.”
His point is that, for industries or disciplines where data in large part determines future strategy and action, a little outside consulting and collection is merited. Self-reported data is, by virtue of the fact that humans are involved, problematic when it comes to accuracy.
This means that every tweet, Facebook update and comment flame war on a review site should be read and collected with a massive grain of Kosher salt. It is hard to imagine how the government would calculate this unreliability into its system through error analysis and standard deviation. Suffice it to say, there is still much work to be done on human reported data, sentiment analysis and social statistics before we could get anywhere close to sorting this all out in any meaningful fashion.
Luckily, as the NY Times reports in the article, not everyone is convinced this is even worthwhile:
“”I’m hard pressed to say that we are witnessing a revolution,’ said Prabhakar Raghavan, the director of Yahoo Labs, who is an information retrieval specialist. He noted that much had been written about predicting flu epidemics by looking at Web searches for ‘flu,’ but noted that the predictions did not improve significantly on what could already be found in data from the Centers for Disease Control and Prevention.”
So, though I myself am drinking the cherry kool aid of acting and strategizing based on the measured results from analytical data, I feel the U.S. Government is seriously overstepping its bounds on this one- both in terms of infringing on other people’s data rights, as well as in terms of outpacing the world’s statistical abilities when applied to cultural data.
Hit me in the comments if you have thoughts of your own on the matter…