Datenschwemme (BD2015)

Aus Philo Wiki
Wechseln zu:Navigation, Suche

"Das Ende der Theorie"

Chris Anderson: The End of Theory: Te Data Deluge makes Scientific Theory Obsolete

Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don't have to settle for wrong models. Indeed, they don't have to settle for models at all.

Was sind Modelle und wie erklären sie? Und was sind ihre Beschränkungen?
"Im Satz wird gleichsam eine Sachlage probeweise zusammengestellt. Man kann geradezu sagen – statt: dieser Satz hat diesen und diesen Sinn -: Dieser Satz stellt diese und diese Sachlage dar." (T 4.031)
"Was jedes Bild, welcher Form immer, mit der Wirklichkeit gemein haben muss, um sie überhaupt – richtig oder falsch – abbilden zu können, ist die logische Form, das ist, die Form der Wirklichkeit." (T 2.18)
"Jeder Satz ist wesentlich wahr-falsch: Um ihn zu verstehen, müssen wir sowohl wissen, was der Fall sein muss, wenn er wahr ist, und was der Fall sein muss, wenn er falsch ist. So hat der Satz zwei Pole, die dem Fall seiner Wahrheit und dem Fall seiner Falschheit entprechen. Dies nennen wir den Sinn des Satzes." (WA 1, Aufzeichnungen über Logik S.188f)
"Im Satz wird eine Welt probeweise zusammengestellt. (Wie im Pariser Gerichtssaal ein Automobilunglück mit Puppen etc. dargestellt wird." (WA 1, 94f)
Au1.jpg Au2.jpg


At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.

Petabyte (PB) 1015 Byte = 1 000 000 000 000 000 Byte.
Man kann sich nicht mehr daran festhalten, dass Daten im Ganzen visualisiert werden können. Es gibt kein übersichtliches Modell, sondern nur ein Überangebot an Daten, das nur mathematisch-statistisch zu bewältigen ist.
Es sind noch immer Daten, was kann das unter diesen Umständen heißen? Datenbanken beruhen auf Modellen. Die Werte, mit denen gerechnet wird, müssen standardisiert sein. Es handelt sich nicht um Impressionen. Die Zugangsweise zu den möglichen Eingaben in Datenschemata ändern sich.

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

Warum geschieht etwas? Es geschieht - das reicht. Mit Erklärungen ist man überfordert, stattdessen können wir uns an die Gegebenheiten halten. Was sind die Gegebenheiten? Daten und Formeln. Was bestätigt den Umgang mit ihnen? Das ist eine falsche Frage, die aus dem Modelldenken stammt. Die Daten sprechen für sich, ohne Modellierung.

The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

Ein doppelter Gebrauch von "Daten".
  • inputs: Sinnesdaten, Eingaben ...
  • inputs: Affektionen, ... in Formulare
Man muss auf diese Differenzierungen achten, um zu verstehen, warum jemand sagen kann, dass pure Daten nur Lärm sind. Etwas wird wahrgenommen, aber man weiß nicht, was.

Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

Die Zahlen sprechen für sich selbst. Die Muster! Was unterscheidet Muster von Modellen?

Fprints.gif


Learning to use a "computer" of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

  • Die Welt verstehen!
  • ohne Kohärenz?
  • Erklärungen sind mechanistisch. Verstehen ohne Erklärungen?

There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?


Computergeschichten

Ian Steadman: Big data and the death of the theorist

His research involves taking vast quantities of data -- usually on the scale of millions, if not billions, of individual data points -- and running algorithms that look for the connections between them on supercomputers. This is the essence of big data, a field with a name that both summarises the problem and offers nothing of what that actually means. One possible definition of it might be how humanity copes with all the information that it produces, and the web, and social media, means that there is a lot of information out there to look through. Exabytes upon exabytes.

The big data approach to intelligence gathering allows an analyst to get the full resolution on worldwide affairs. Nothing is lost from looking too closely at one particular section of data; nothing is lost from trying to get too wide a perspective on a situation that the fine detail is lost. The algorithms find the patterns and the hypothesis follows from the data. The analyst doesn't even have to bother proposing a hypothesis any more. Her role switches from proactive to reactive, with the algorithms doing the contextual work.

The algorithms find the patterns. Wer entwirft die Algorithmen, wer identifiziert die Muster? Wer lässt sich von statistisch produzierten Mustern beeinflussen?

Inuse.gif

Karl Marx spent 12 years in the British Library developing both carbuncles and the intellectual framework for Das Kapital. While many of his ideas may not be fashionable in the economic mainstream, it's notable that he did predict that even the intellectuals would one day need to face up to being replaced with machines. It's doubtful, however, whether he would have foreseen an automaton one day being able to look through all of the sources that he used -- and millions more -- within a fraction of the time he spent, and being able to present its own models of history.

In the same way that the internal combustion engine spelled the end of the horse as a working animal, big data could be the tool to render host of academic disciplines redundant if it proves better at building better narratives of human society.

That's something that Melissa Terras from UCL's Centre for Digital Humanities agrees with -- and even big data patterns need someone to understand them. She said: "To understand the question to ask of the data requires insight into cultures and history. Just because you made a pretty map that looks pretty, it doesn't answer a question that improves our understanding of it. We're asking the big questions about society and culture."

The revolution that big data brings to the humanities -- and any subject that deals with humanity on a profound level -- is that it provides a new way to construct models and narratives. But we have to know if those narratives are equivalent to the truth, and the gut feeling there is surely that they're not.

  • Hier sind es also neue Modelle.
  • Und die "Erzählungen" sollen der Wahrheit entsprechen.
  • Und darüber soll instinktiv entschieden werden.