Datenschwemme (BD2015)

Aus Philo Wiki
Version vom 25. November 2015, 10:42 Uhr von Anna (Diskussion | Beiträge) (first)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Wechseln zu:Navigation, Suche

"Das Ende der Theorie"

Chris Anderson: The End of Theory: Te Data Deluge makes Scientific Theory Obsolete

Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don't have to settle for wrong models. Indeed, they don't have to settle for models at all.

At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

Learning to use a "computer" of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?


Computergeschichten

Ian Steadman: Big data and the death of the theorist

His research involves taking vast quantities of data -- usually on the scale of millions, if not billions, of individual data points -- and running algorithms that look for the connections between them on supercomputers. This is the essence of big data, a field with a name that both summarises the problem and offers nothing of what that actually means. One possible definition of it might be how humanity copes with all the information that it produces, and the web, and social media, means that there is a lot of information out there to look through. Exabytes upon exabytes.

The big data approach to intelligence gathering allows an analyst to get the full resolution on worldwide affairs. Nothing is lost from looking too closely at one particular section of data; nothing is lost from trying to get too wide a perspective on a situation that the fine detail is lost. The algorithms find the patterns and the hypothesis follows from the data. The analyst doesn't even have to bother proposing a hypothesis any more. Her role switches from proactive to reactive, with the algorithms doing the contextual work.

Karl Marx spent 12 years in the British Library developing both carbuncles and the intellectual framework for Das Kapital. While many of his ideas may not be fashionable in the economic mainstream, it's notable that he did predict that even the intellectuals would one day need to face up to being replaced with machines. It's doubtful, however, whether he would have foreseen an automaton one day being able to look through all of the sources that he used -- and millions more -- within a fraction of the time he spent, and being able to present its own models of history.

In the same way that the internal combustion engine spelled the end of the horse as a working animal, big data could be the tool to render host of academic disciplines redundant if it proves better at building better narratives of human society.

That's something that Melissa Terras from UCL's Centre for Digital Humanities agrees with -- and even big data patterns need someone to understand them. She said: "To understand the question to ask of the data requires insight into cultures and history. Just because you made a pretty map that looks pretty, it doesn't answer a question that improves our understanding of it. We're asking the big questions about society and culture."

The revolution that big data brings to the humanities -- and any subject that deals with humanity on a profound level -- is that it provides a new way to construct models and narratives. But we have to know if those narratives are equivalent to the truth, and the gut feeling there is surely that they're not.